152 lines
7.3 KiB
Typst
152 lines
7.3 KiB
Typst
#import "../lib.typ": todo, APK
|
|
#import "@preview/diagraph:0.3.3": raw-render
|
|
|
|
== Android Reverse Engineering Techniques <sec:bg-techniques>
|
|
|
|
#todo[swap with tool section ?]
|
|
|
|
In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications.
|
|
Two main approaches can be distinguished: static and dynamic analysis@Li2017.
|
|
Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system.
|
|
For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task.
|
|
Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage.
|
|
For malware, dynamic analysis is also limited by evading techniques that may prevent the execution of malicious parts of the code.
|
|
//As a consequence, a lot of efforts have been put in static approaches, which is the focus of this paper.
|
|
|
|
=== Static Analysis <sec:bg-static>
|
|
|
|
Static analysis program examine an #APK file without executing it to extract information from it.
|
|
Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code.
|
|
|
|
More advance analysis consist in the computing the control-flow of an application and computing its data-flow@Li2017.
|
|
|
|
The most basic form of control-flow analysis is to build a call graph.
|
|
A call graph is a graph where the nodes represent the methods in the application, and the edges reprensent calls from one method to another.
|
|
@fig:bg-fizzbuzz-cg-cfg b) show the call graph of the code in @fig:bg-fizzbuzz-cg-cfg a).
|
|
A more advance control-flow analysis consist in building the control-flow graph.
|
|
This times instead of methods, the nodes represent instructions, and the edges indicate which instruction can follow which instruction.
|
|
@fig:bg-fizzbuzz-cg-cfg c) represent the control-flow graph of @fig:bg-fizzbuzz-cg-cfg a), with code statement instead of bytecode instructions.
|
|
|
|
#figure({
|
|
set align(center)
|
|
stack(dir: ttb,[
|
|
#figure(
|
|
```java
|
|
public static void fizzBuzz(int n) {
|
|
for (int i = 1; i <= n; i++) {
|
|
if (i % 3 == 0 && i % 5 == 0) {
|
|
Buzzer.fizzBuzz();
|
|
} else if (i % 3 == 0) {
|
|
Buzzer.fizz();
|
|
} else if (i % 5 == 0) {
|
|
Buzzer.buzz();
|
|
} else {
|
|
Log.e("fizzbuzz", String.valueOf(i));
|
|
}
|
|
}
|
|
}
|
|
```,
|
|
supplement: none,
|
|
kind: "bg-fizzbuzz-cg-cfg subfig",
|
|
caption: [a) A Java program],
|
|
) <fig:bg-fizzbuzz-java>], v(2em), stack(dir: ltr, [
|
|
#figure(
|
|
raw-render(```
|
|
digraph {
|
|
rankdir=LR
|
|
"fizzBuzz(int)" -> "Buzzer.fizzBuzz()"
|
|
"fizzBuzz(int)" -> "Buzzer.fizz()"
|
|
"fizzBuzz(int)" -> "Buzzer.buzz()"
|
|
"fizzBuzz(int)" -> "String.valueOf(int)"
|
|
"fizzBuzz(int)" -> "Log.e(String, String)"
|
|
}
|
|
```,
|
|
width: 40%
|
|
),
|
|
supplement: none,
|
|
kind: "bg-fizzbuzz-cg-cfg subfig",
|
|
caption: [b) Corresponding Call Graph]
|
|
) <fig:bg-fizzbuzz-cg>],[
|
|
#figure(
|
|
raw-render(```
|
|
digraph {
|
|
l1
|
|
l2
|
|
l3
|
|
l4
|
|
l5
|
|
l6
|
|
l7
|
|
l9
|
|
|
|
l1 -> l2
|
|
l2 -> l3
|
|
l3 -> l1
|
|
l2 -> l4
|
|
l4 -> l5
|
|
l5 -> l1
|
|
l4 -> l6
|
|
l6 -> l7
|
|
l7 -> l1
|
|
l6 -> l9
|
|
l9 -> l1
|
|
}
|
|
```,
|
|
labels: (
|
|
"l1": `for (int i = 1; i <= n; i++) {`,
|
|
"l2": `if (i % 3 == 0 && i % 5 == 0) {`,
|
|
"l3": `Buzzer.fizzBuzz();`,
|
|
"l4": `} else if (i % 3 == 0) {`,
|
|
"l5": `Buzzer.fizz();`,
|
|
"l6": `} else if (i % 5 == 0) {`,
|
|
"l7": `Buzzer.buzz();`,
|
|
"l9": `Log.e("fizzbuzz", String.valueOf(i));`,
|
|
),
|
|
width: 50%
|
|
),
|
|
supplement: none,
|
|
kind: "bg-fizzbuzz-cg-cfg subfig",
|
|
caption: [c) Corresponding Control-Flow Graph]
|
|
) <fig:bg-fizzbuzz-cfg>]))
|
|
h(1em)},
|
|
supplement: [Figure],
|
|
caption: [Source code for a simple Java method and its Call and Control Flow Graphs],
|
|
)<fig:bg-fizzbuzz-cg-cfg>
|
|
|
|
Once the control-flow graph is computed, it can be used to compute data-flows.
|
|
Data-flow analysis, also called taint-tracking, allows to follow the flow of information in the application.
|
|
Be defining a list of methods and fields that can generate critical information (taint sources) and a list of method that can consume information (taint sink), taint-tracking allows to detect potential data leak (if a data flow link a taint source and a taint sink).
|
|
For example, `TelephonyManager.getImei()` is return an unique, persistent, device identifier.
|
|
This can be used to identify the user can cannot be changed if compromised.
|
|
This make `TelephonyManager.getImei()` a good candidate as a taint source.
|
|
On the other hand, `UrlRequest.start()` send a request to an external server, making it a taint sink.
|
|
If a data-flow is found linking `TelephonyManager.getImei()` to `UrlRequest.start()`, this means the application is potentially leaking a critical information to an external entity, a behavior that is probably not wanted by the user.
|
|
Data-flow analysis is the subject of many contribution@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable source being Flowdroid@Arzt2014a.
|
|
|
|
Static analysis is powerfull as it allows to detects unwanted behavior in an application even is the behavior does not manifest itself when running the application.
|
|
Hovewer, static analysis tools must overcom many challenges when analysing Android applications:
|
|
/ the Java object-oriented paradigm: A call to a method can in fact correspond to a call to any method overriding the original method in subclasses
|
|
/ the multiplicity of entry points: Each component of an application can be an entry point for the application
|
|
/ the event driven architecture: Methods of in the applications can be called in many different order depending on external events
|
|
/ the interleaving of native code and bytecode: Native code can be called from bytecode and vice versa, but tools often only handle one of those format
|
|
/ the potential dynamic code loading: And application can run code that was not orriginally in the application
|
|
/ the use of reflection: Methods can be called from their name as a string object, which is not necessary known statically
|
|
/ the continual evolution of Android: each new version brings new features that an analysis tools must be aware of
|
|
|
|
The tools can share the backend used to interact with the bytecode.
|
|
For example, Apktool is often called in a subprocess to extracte the bytecode, and the Soot framework is a commonly used both to analyse bytecode and modify it.
|
|
The most notable user of Soot is Flowdroid.
|
|
|
|
=== Dynamic Analysis <sec:bg-dynamic>
|
|
|
|
#todo[y a du boulot]
|
|
|
|
- #todo[evasion: droid DroidDungeon @ruggia_unmasking_2024]
|
|
- #todo[DroidScope@droidscope180237 and CopperDroid@Tam2015]
|
|
- #todo[Xposed: DroidHook / Mirage: Toward a stealthier and modular malware analysis sandbox for android]
|
|
- #todo[Frida: CamoDroid]
|
|
- #todo[modified android framework: RealDroid]
|
|
|
|
=== Hybrid Analysis <sec:bg-hybrid>
|
|
|
|
- #todo[DyDroid, audit of Dynamic Code Loading@qu_dydroid_2017]
|