186 lines
11 KiB
Typst
186 lines
11 KiB
Typst
#import "../lib.typ": todo, APK, etal, ART, eg, jm-note
|
||
#import "@preview/diagraph:0.3.3": raw-render
|
||
|
||
== Android Reverse Engineering Techniques <sec:bg-techniques>
|
||
|
||
#todo[swap with tool section ?]
|
||
|
||
In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications.
|
||
Two main approaches can be distinguished: static and dynamic analysis@Li2017.
|
||
Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system.
|
||
For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task.
|
||
Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage.
|
||
For malware, dynamic analysis is also limited by evading techniques that may prevent the execution of malicious parts of the code.
|
||
//As a consequence, a lot of efforts have been put in static approaches, which is the focus of this paper.
|
||
|
||
=== Static Analysis <sec:bg-static>
|
||
|
||
Static analysis program examine an #APK file without executing it to extract information from it.
|
||
Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code.
|
||
|
||
More advance analysis consist in the computing the control-flow of an application and computing its data-flow@Li2017.
|
||
|
||
The most basic form of control-flow analysis is to build a call graph.
|
||
A call graph is a graph where the nodes represent the methods in the application, and the edges reprensent calls from one method to another.
|
||
@fig:bg-fizzbuzz-cg-cfg b) show the call graph of the code in @fig:bg-fizzbuzz-cg-cfg a).
|
||
A more advance control-flow analysis consist in building the control-flow graph.
|
||
This times instead of methods, the nodes represent instructions, and the edges indicate which instruction can follow which instruction.
|
||
@fig:bg-fizzbuzz-cg-cfg c) represent the control-flow graph of @fig:bg-fizzbuzz-cg-cfg a), with code statement instead of bytecode instructions.
|
||
|
||
#figure({
|
||
set align(center)
|
||
stack(dir: ttb,[
|
||
#figure(
|
||
```java
|
||
public static void fizzBuzz(int n) {
|
||
for (int i = 1; i <= n; i++) {
|
||
if (i % 3 == 0 && i % 5 == 0) {
|
||
Buzzer.fizzBuzz();
|
||
} else if (i % 3 == 0) {
|
||
Buzzer.fizz();
|
||
} else if (i % 5 == 0) {
|
||
Buzzer.buzz();
|
||
} else {
|
||
Log.e("fizzbuzz", String.valueOf(i));
|
||
}
|
||
}
|
||
}
|
||
```,
|
||
supplement: none,
|
||
kind: "bg-fizzbuzz-cg-cfg subfig",
|
||
caption: [a) A Java program],
|
||
) <fig:bg-fizzbuzz-java>], v(2em), stack(dir: ltr, [
|
||
#figure(
|
||
raw-render(```
|
||
digraph {
|
||
rankdir=LR
|
||
"fizzBuzz(int)" -> "Buzzer.fizzBuzz()"
|
||
"fizzBuzz(int)" -> "Buzzer.fizz()"
|
||
"fizzBuzz(int)" -> "Buzzer.buzz()"
|
||
"fizzBuzz(int)" -> "String.valueOf(int)"
|
||
"fizzBuzz(int)" -> "Log.e(String, String)"
|
||
}
|
||
```,
|
||
width: 40%
|
||
),
|
||
supplement: none,
|
||
kind: "bg-fizzbuzz-cg-cfg subfig",
|
||
caption: [b) Corresponding Call Graph]
|
||
) <fig:bg-fizzbuzz-cg>],[
|
||
#figure(
|
||
raw-render(```
|
||
digraph {
|
||
l1
|
||
l2
|
||
l3
|
||
l4
|
||
l5
|
||
l6
|
||
l7
|
||
l9
|
||
|
||
l1 -> l2
|
||
l2 -> l3
|
||
l3 -> l1
|
||
l2 -> l4
|
||
l4 -> l5
|
||
l5 -> l1
|
||
l4 -> l6
|
||
l6 -> l7
|
||
l7 -> l1
|
||
l6 -> l9
|
||
l9 -> l1
|
||
}
|
||
```,
|
||
labels: (
|
||
"l1": `for (int i = 1; i <= n; i++) {`,
|
||
"l2": `if (i % 3 == 0 && i % 5 == 0) {`,
|
||
"l3": `Buzzer.fizzBuzz();`,
|
||
"l4": `} else if (i % 3 == 0) {`,
|
||
"l5": `Buzzer.fizz();`,
|
||
"l6": `} else if (i % 5 == 0) {`,
|
||
"l7": `Buzzer.buzz();`,
|
||
"l9": `Log.e("fizzbuzz", String.valueOf(i));`,
|
||
),
|
||
width: 50%
|
||
),
|
||
supplement: none,
|
||
kind: "bg-fizzbuzz-cg-cfg subfig",
|
||
caption: [c) Corresponding Control-Flow Graph]
|
||
) <fig:bg-fizzbuzz-cfg>]))
|
||
h(1em)},
|
||
supplement: [Figure],
|
||
caption: [Source code for a simple Java method and its Call and Control Flow Graphs],
|
||
)<fig:bg-fizzbuzz-cg-cfg>
|
||
|
||
Once the control-flow graph is computed, it can be used to compute data-flows.
|
||
Data-flow analysis, also called taint-tracking, allows to follow the flow of information in the application.
|
||
Be defining a list of methods and fields that can generate critical information (taint sources) and a list of method that can consume information (taint sink), taint-tracking allows to detect potential data leak (if a data flow link a taint source and a taint sink).
|
||
For example, `TelephonyManager.getImei()` is return an unique, persistent, device identifier.
|
||
This can be used to identify the user can cannot be changed if compromised.
|
||
This make `TelephonyManager.getImei()` a good candidate as a taint source.
|
||
On the other hand, `UrlRequest.start()` send a request to an external server, making it a taint sink.
|
||
If a data-flow is found linking `TelephonyManager.getImei()` to `UrlRequest.start()`, this means the application is potentially leaking a critical information to an external entity, a behavior that is probably not wanted by the user.
|
||
Data-flow analysis is the subject of many contribution@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable source being Flowdroid@Arzt2014a.
|
||
|
||
#todo[Describe the different contributions in relations to the issues they tackle]
|
||
|
||
Static analysis is powerfull as it allows to detects unwanted behavior in an application even is the behavior does not manifest itself when running the application.
|
||
Hovewer, static analysis tools must overcom many challenges when analysing Android applications:
|
||
/ the Java object-oriented paradigm: A call to a method can in fact correspond to a call to any method overriding the original method in subclasses
|
||
/ the multiplicity of entry points: Each component of an application can be an entry point for the application
|
||
/ the event driven architecture: Methods of in the applications can be called in many different order depending on external events
|
||
/ the interleaving of native code and bytecode: Native code can be called from bytecode and vice versa, but tools often only handle one of those format
|
||
/ the potential dynamic code loading: And application can run code that was not orriginally in the application
|
||
/ the use of reflection: Methods can be called from their name as a string object, which is not necessary known statically
|
||
/ the continual evolution of Android: each new version brings new features that an analysis tools must be aware of
|
||
|
||
The tools can share the backend used to interact with the bytecode.
|
||
For example, Apktool is often called in a subprocess to extracte the bytecode, and the Soot framework is a commonly used both to analyse bytecode and modify it.
|
||
The most notable user of Soot is Flowdroid. #todo[formulation]
|
||
|
||
=== Dynamic Analysis <sec:bg-dynamic>
|
||
|
||
The alternative to static analysis is dynamic analysis.
|
||
With dynamic analysis, the application is actually executed.
|
||
The most simple strategies consist in just running the application and examining its behavior.
|
||
For instance, Shao #etal #todo[cit] capture the network communication of an application and analyse those traces, while Bhatia #etal #todo[cit] take #jm-note[periodic][meh] snapshots of the memory to deduce the beavior of the application #todo[check the papers].
|
||
|
||
More advanced methods are more intrusive and require modifing either the #APK, the Android framework, runtime, or kernel.
|
||
TaintDroid #todo[cit] for example modify the Dalvik Virtual Machine (the predecessor of the #ART) to track the data flow of an application at runtime, while AndroBlare #todo[cit] try to compute the taint flow by hooking system calls from a kernel module. #todo[check papers]
|
||
#todo[RealDroid?]
|
||
|
||
Modifying the Android framwork, runtime or kernel is possible thanks to the Android project beeing opensource, however this is delicate operation.
|
||
Thus, a common issue faced by tools that took this approach is that they are stuck with a specific version of Android.
|
||
DroidScope@droidscope180237 and CopperDroid@Tam2015 are two well known sandbox faced with this issue. #todo[check, and add android version]
|
||
To limit this problem, other sandbox focus on hooking strategies, like DroidHook and Mirage #todo[cit, check paper], based on the Xposed framework, and CamoDroid #todo[cit and check], based on Frida.
|
||
|
||
Another known challenge when analysing an application dynamically is the code coverage: if some part of the application is not executed, it cannot be annalysed.
|
||
Considering that Android applications are meant to interact with a user, this can become problematic for automatic analysis.
|
||
#todo[runner considered]
|
||
GroddDroid use static analysis to use static analysis to find suspicious code section and then use this information to guide a runner that uses the #todo[whatisnameagain?] framework to triger those suspicious section of code.
|
||
More challenging, some application will try to detect is they are in a sandbox environnement (#eg if they are in an emmulator, or if Frida is present in memory) and will refuse to run some sections of code if this is the case.
|
||
#todo[name] #etal @ruggia_unmasking_2024 make a list of evation techniques.
|
||
They show that most current analysis framework failled to hide themself correctly and introduce a new sandbox, DroidDungeon, that do avoid detection. #todo[limitation?]
|
||
#todo[force execution?]
|
||
|
||
// Shao et al. Yuru Shao, Jason Ott, Yunhan Jack Jia, Zhiyun Qian, and Z Morley Mao. ‘The Misuse of Android Unix Domain Sockets and Security Implications’. In: ACM SIGSAC Conference on Computer and Communications Security. Vienna, Austria: ACM, Oct. 2016, pp. 80–91.
|
||
// Bhatia et al. Rohit Bhatia, Brendan Saltaformaggio, Seung Jei Yang, Aisha Ali-Gombe, Xiangyu Zhang, Dongyan Xu, and Golden G Richard III. ‘"Tipped Off by Your Memory Allocator": Device-Wide User Activity Sequencing from Android Memory Images’. In: (Feb. 2018).
|
||
|
||
- #todo[evasion: droid DroidDungeon @ruggia_unmasking_2024]
|
||
- #todo[Xposed: DroidHook / Mirage: Toward a stealthier and modular malware analysis sandbox for android]
|
||
- #todo[Frida: CamoDroid]
|
||
- #todo[
|
||
modified android framework, framework or kernel:
|
||
- RealDroid
|
||
- AndroBlare, taint analysis, linux module to hook syscalls, c'est maison
|
||
Radoniaina Andriatsimandefitra and Valérie Viet Triem Tong. ‘Detection and identification of Android malware based on information flow monitoring’. In: 2nd International Conference on Cyber Security and
|
||
Cloud Computing. New York, USA: IEEE, Jan. 2015, pp. 200–203.
|
||
Radoniaina Andriatsimandefitra, Stéphane Geller, and Valérie Viet Triem Tong. ‘Designing information flow policies for Android’s operating system’. In: IEEE International conference on communications.Ottawa, ON, Canada: IEEE, June 2012, pp. 976–981.
|
||
- TaintDroid (check if dynamic? strange, cf Reaves et al) modifies the Dalvik Virtual Machine (DVM) interpreter to manage taint
|
||
]
|
||
|
||
=== Hybrid Analysis <sec:bg-hybrid>
|
||
#todo[merge with other section?]
|
||
|
||
- #todo[DyDroid, audit of Dynamic Code Loading@qu_dydroid_2017]
|