integrate bg of rasta in bg section

2025-08-12 18:43:28 +02:00 · 2025-08-12 18:43:28 +02:00 · 5e512b585a
commit 5e512b585a
parent 94d26973d3
11 changed files with 170 additions and 107 deletions
--- a/2_background/3_analysis_techniques.typ
+++ b/2_background/3_analysis_techniques.typ
@ -1,172 +0,0 @@
-#import "../lib.typ": todo, APK, etal, ART, SDK, eg, jm-note, jfl-note
-#import "@preview/diagraph:0.3.3": raw-render
-
-== Android Reverse Engineering Techniques <sec:bg-techniques>
-
-//#todo[swap with tool section ?]
-
-In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications.
-Two main approaches can be distinguished: static and dynamic analysis~@Li2017.
-Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system.
-For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task.
-Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage.
-For malware, dynamic analysis is also limited by evading techniques that may prevent the execution of malicious parts of the code.
-//As a consequence, a lot of efforts have been put in static approaches, which is the focus of this paper.
-
-=== Static Analysis <sec:bg-static>
-
-Static analysis program examine an #APK file without executing it to extract information from it.
-Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code.
-
-More advance analysis consist in the computing the control-flow of an application and computing its data-flow~@Li2017.
-
-The most basic form of control-flow analysis is to build a call graph.
-A call graph is a graph where the nodes represent the methods in the application, and the edges reprensent calls from one method to another.
-@fig:bg-fizzbuzz-cg-cfg b) show the call graph of the code in @fig:bg-fizzbuzz-cg-cfg a).
-A more advance control-flow analysis consist in building the control-flow graph.
-This time, instead of methods, the nodes represent instructions, and the edges indicate which instruction can follow which instruction.
-@fig:bg-fizzbuzz-cg-cfg c) represents the control-flow graph of @fig:bg-fizzbuzz-cg-cfg a), with code statement instead of bytecode instructions.
-
-#figure({
-  set align(center)
-  stack(dir: ttb,[
-  #figure(
-    ```java
-    public static void fizzBuzz(int n) {
-      for (int i = 1; i <= n; i++) {
-        if (i % 3 == 0 && i % 5 == 0) {
-          Buzzer.fizzBuzz();
-        } else if (i % 3 == 0) {
-          Buzzer.fizz();
-        } else if (i % 5 == 0) {
-          Buzzer.buzz();
-        } else {
-          Log.e("fizzbuzz", String.valueOf(i));
-        }
-      }
-    }
-    ```,
-    supplement: none,
-    kind: "bg-fizzbuzz-cg-cfg subfig",
-    caption: [a) A Java program],
-  ) <fig:bg-fizzbuzz-java>], v(2em), stack(dir: ltr, [
-  #figure(
-    raw-render(```
-      digraph {
-        rankdir=LR
-        "fizzBuzz(int)" -> "Buzzer.fizzBuzz()"
-        "fizzBuzz(int)" -> "Buzzer.fizz()"
-        "fizzBuzz(int)" -> "Buzzer.buzz()"
-        "fizzBuzz(int)" -> "String.valueOf(int)"
-        "fizzBuzz(int)" -> "Log.e(String, String)"
-      }
-      ```,
-      width: 40%
-    ),
-    supplement: none,
-    kind: "bg-fizzbuzz-cg-cfg subfig",
-    caption: [b) Corresponding Call Graph]
-  ) <fig:bg-fizzbuzz-cg>],[
-  #figure(
-    raw-render(```
-      digraph {
-        l1
-        l2
-        l3
-        l4
-        l5
-        l6
-        l7
-        l9
-  
-        l1 -> l2
-        l2 -> l3
-        l3 -> l1
-        l2 -> l4
-        l4 -> l5
-        l5 -> l1
-        l4 -> l6
-        l6 -> l7
-        l7 -> l1
-        l6 -> l9
-        l9 -> l1
-      }
-      ```,
-      labels: (
-        "l1": `for (int i = 1; i <= n; i++) {`,
-        "l2": `if (i % 3 == 0 && i % 5 == 0) {`,
-        "l3": `Buzzer.fizzBuzz();`,
-        "l4": `} else if (i % 3 == 0) {`,
-        "l5": `Buzzer.fizz();`,
-        "l6": `} else if (i % 5 == 0) {`,
-        "l7": `Buzzer.buzz();`,
-        "l9": `Log.e("fizzbuzz", String.valueOf(i));`,
-      ),
-      width: 50%
-    ),
-    supplement: none,
-    kind: "bg-fizzbuzz-cg-cfg subfig",
-    caption: [c) Corresponding Control-Flow Graph]
-  ) <fig:bg-fizzbuzz-cfg>]))
-  h(1em)},
-  supplement: [Figure],
-  caption: [Source code for a simple Java method and its Call and Control Flow Graphs],
-)<fig:bg-fizzbuzz-cg-cfg>
-
-Once the control-flow graph is computed, it can be used to compute data-flows.
-Data-flow analysis, also called taint-tracking, allows to follow the flow of information in the application.
-Be defining a list of methods and fields that can generate critical information (taint sources) and a list of methods that can consume information (taint sink), taint-tracking allows to detect potential data leaks (if a data flow link a taint source and a taint sink).
-For example, `TelephonyManager.getImei()` returns an unique, persistent, device identifier.
-This can be used to identify the user, and it cannot be changed if #jfl-note[compromised][replace by: this imei is dislaxd (illisible) \ jm: ???].
-This make `TelephonyManager.getImei()` a good candidate as a taint source.
-On the other hand, `UrlRequest.start()` send a request to an external server, making it a taint sink.
-If a data-flow is found linking `TelephonyManager.getImei()` to `UrlRequest.start()`, this means the application is potentially leaking a critical information to an external entity, a behavior that is probably not wanted by the user.
-Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable tool being Flowdroid~@Arzt2014a.
-
-#todo[Describe the different contributions in relations to the issues they tackle]
-
-Static analysis is powerfull as it allows to detects unwanted behavior in an application even is the behavior does not manifest itself when running the application.
-Hovewer, static analysis tools must overcom many challenges when analysing Android applications:
-/ the Java object-oriented paradigm: A call to a method can in fact correspond to a call to any method overriding the original method in subclasses.
-/ the multiplicity of entry points: Each component of an application can be an entry point for the application.
-/ the event driven architecture: Methods of in the applications can be called when event occur, in unknown order.
-/ the interleaving of native code and bytecode: Native code can be called from bytecode and vice versa, but tools often only handle one of those format.
-/ the potential dynamic code loading: An application can run code that was not originally in the application.
-/ the use of reflection: Methods can be called from their name as a string object, which is difficult to identify statically.
-/ the continual evolution of Android: each new version of Android brings new features that an analysis tools must be aware of. 
-  For instance, the multi-dex feature presented in @sec:bg-android-code-format was introduced in Android #SDK 21.
-  Tools unaware of this feature only analyse the `classes.dex` file an will ignore all other `classes<n>.dex` files.
-
-#jfl-note[The tools can share the backend used to interact with the bytecode. 
-For example, Apktool is often called in a subprocess to extracte the bytecode, and the Soot framework is a commonly used both to analyse bytecode and modify it.
-The most notable user of Soot is Flowdroid. #todo[formulation]][mettre ca a avant]
-
-=== Dynamic Analysis <sec:bg-dynamic>
-
-The alternative to static analysis is dynamic analysis.
-With dynamic analysis, the application is actually executed.
-The most simple strategies consist in just running the application and examining its behavior.
-For instance, Bernardi #etal~@bernardi_dynamic_2019 use the log generated by `strace` to list the system calls generated in responce to an event to determine if an application is malicious.
-
-More advanced methods are more intrusive and require modifing either the #APK, the Android framework, runtime, or kernel.
-TaintDroid~@Enck2010 for example modify the Dalvik Virtual Machine (the predecessor of the #ART) to track the data flow of an application at runtime, while AndroBlare~@Andriatsimandefitra2012 @andriatsimandefitra_detection_2015 try to compute the taint flow by hooking system calls using a Linux Security Module.
-DexHunter~@zhang2015dexhunter and AppSpear~@yang_appspear_2015 also patch the Dalvik Virtual Machine/#ART, this time to collect bytecode loaded dynamically.
-Modifying the Android framwork, runtime or kernel is possible thanks to the Android project beeing opensource, however this is delicate operation.
-Thus, a common issue faced by tools that took this approach is that they are stuck with a specific version of Android.
-Some sandboxes limit this issue by using dynamic binary instrumentation, like DroidHook~@cui_droidhook_2023, based the Xposed framework, or CamoDroid~@faghihi_camodroid_2022, based on Frida.
-
-Another known challenge when analysing an application dynamically is the code coverage: if some part of the application is not executed, it cannot be annalysed.
-Considering that Android applications are meant to interact with a user, this can become problematic for automatic analysis.
-The Monkey tool developed by Google is one of the most used solution~@sutter_dynamic_2024.
-It sends a random streams of events the phone without tracking the state of the application.
-More advance tools statically analyse the application to model in order to improve the exploration.
-Sapienz~@mao_sapienz_2016 and Stoat~@su_guided_2017 uses this technique to improve application testing.
-GroddDroid~@abraham_grodddroid_2015 has the same approach but detect statically suspicious sections of code to target, and will interact with the application to target those code section.
-
-Unfortuntely, exploring the application entirely is not always possible, as some applications will try to detect is they are in a sandbox environnement (#eg if they are in an emmulator, or if Frida is present in memory) and will refuse to run some sections of code if this is the case.
-Ruggia #etal~@ruggia_unmasking_2024 make a list of evasion techniques.
-They propose a new sandbox, DroidDungeon, that contrary to other sandboxes like DroidScope@droidscope180237 or CopperDroid@Tam2015, strongly emphasizes on resiliance against evasion mechanism.
-
-#todo[RealDroid sandbox bases on modified ART?]
-#todo[force execution?]
-#todo[DyDroid, audit of Dynamic Code Loading~@qu_dydroid_2017]