rerefactor bg

2025-09-24 00:44:19 +02:00 · 2025-09-24 00:44:19 +02:00 · d1dba30426
commit d1dba30426
parent d9650d0775
11 changed files with 159 additions and 98 deletions
--- a/2_background/2_3_static_analysis.typ
+++ b/2_background/2_3_static_analysis.typ
@ -0,0 +1,135 @@
+#import "../lib.typ": APK, etal, ART, SDK, DEX, eg, 
+#import "../lib.typ": todo, jm-note, jfl-note
+#import "@preview/diagraph:0.3.5": raw-render
+
+=== Static Analysis <sec:bg-static>
+
+Static analysis program examine an #APK file without executing it to extract information from it.
+Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code with tools like Apktool or Jadx.
+Unfortunately, simply reading the bytecode does not scale.
+To do so, a human analyst is needed, making it complicated to analyse a large number of applications, and even for single applications, the size and complexity of some applications can quickly overwhelm the reverse engineer.
+
+Control flow analysis is often used to mitigate this issue.
+The idea is to extract the behaviour, the flow, of the application from the bytecode, and to represent it as a graph.
+A graph representation is easier to work with than a list of instructions, and can be used for further analysis.
+Depending on the level of precision required, different types of graphs can be computed.
+The most basic of those graph is the call graph.
+A call graph is a graph where the nodes represent the methods in the application, and the edges reprensent calls from one method to another.
+@fig:bg-fizzbuzz-cg-cfg b) show the call graph of the code in @fig:bg-fizzbuzz-cg-cfg a).
+A more advance control-flow analysis consist in building the control-flow graph.
+This time, instead of methods, the nodes represent instructions, and the edges indicate which instruction can follow which instruction.
+@fig:bg-fizzbuzz-cg-cfg c) represents the control-flow graph of @fig:bg-fizzbuzz-cg-cfg a), with code statement instead of bytecode instructions.
+
+#todo[Add alt text for @fig:bg-fizzbuzz-cg and @fig:bg-fizzbuzz-cfg]
+
+#figure({
+  set align(center)
+  stack(dir: ttb,[
+  #figure(
+    ```java
+    public static void fizzBuzz(int n) {
+      for (int i = 1; i <= n; i++) {
+        if (i % 3 == 0 && i % 5 == 0) {
+          Buzzer.fizzBuzz();
+        } else if (i % 3 == 0) {
+          Buzzer.fizz();
+        } else if (i % 5 == 0) {
+          Buzzer.buzz();
+        } else {
+          Log.e("fizzbuzz", String.valueOf(i));
+        }
+      }
+    }
+    ```,
+    supplement: none,
+    kind: "bg-fizzbuzz-cg-cfg subfig",
+    caption: [a) A Java program],
+  ) <fig:bg-fizzbuzz-java>], v(2em), stack(dir: ltr, [
+  #figure(
+    raw-render(```
+      digraph {
+        rankdir=LR
+        "fizzBuzz(int)" -> "Buzzer.fizzBuzz()"
+        "fizzBuzz(int)" -> "Buzzer.fizz()"
+        "fizzBuzz(int)" -> "Buzzer.buzz()"
+        "fizzBuzz(int)" -> "String.valueOf(int)"
+        "fizzBuzz(int)" -> "Log.e(String, String)"
+      }
+      ```,
+      width: 40%,
+      alt: "",
+    ),
+    supplement: none,
+    kind: "bg-fizzbuzz-cg-cfg subfig",
+    caption: [b) Corresponding Call Graph]
+  ) <fig:bg-fizzbuzz-cg>],[
+  #figure(
+    raw-render(```
+      digraph {
+        l1
+        l2
+        l3
+        l4
+        l5
+        l6
+        l7
+        l9
+  
+        l1 -> l2
+        l2 -> l3
+        l3 -> l1
+        l2 -> l4
+        l4 -> l5
+        l5 -> l1
+        l4 -> l6
+        l6 -> l7
+        l7 -> l1
+        l6 -> l9
+        l9 -> l1
+      }
+      ```,
+      labels: (
+        "l1": `for (int i = 1; i <= n; i++) {`,
+        "l2": `if (i % 3 == 0 && i % 5 == 0) {`,
+        "l3": `Buzzer.fizzBuzz();`,
+        "l4": `} else if (i % 3 == 0) {`,
+        "l5": `Buzzer.fizz();`,
+        "l6": `} else if (i % 5 == 0) {`,
+        "l7": `Buzzer.buzz();`,
+        "l9": `Log.e("fizzbuzz", String.valueOf(i));`,
+      ),
+      width: 50%,
+      alt: "",
+    ),
+    supplement: none,
+    kind: "bg-fizzbuzz-cg-cfg subfig",
+    caption: [c) Corresponding Control-Flow Graph]
+  ) <fig:bg-fizzbuzz-cfg>]))
+  h(1em)},
+  supplement: [Figure],
+  caption: [Source code for a simple Java method and its Call and Control Flow Graphs],
+)<fig:bg-fizzbuzz-cg-cfg>
+
+Once the control-flow graph is computed, it can be used to compute data-flows.
+Data-flow analysis, also called taint-tracking, allows to follow the flow of information in the application.
+Be defining a list of methods and fields that can generate critical information (taint sources) and a list of methods that can consume information (taint sink), taint-tracking allows to detect potential data leaks (if a data flow link a taint source and a taint sink).
+For example, `TelephonyManager.getImei()` returns an unique, persistent, device identifier.
+This can be used to identify the user, and it cannot be changed if compromised.
+This make `TelephonyManager.getImei()` a good candidate as a taint source.
+On the other hand, `UrlRequest.start()` send a request to an external server, making it a taint sink.
+If a data-flow is found linking `TelephonyManager.getImei()` to `UrlRequest.start()`, this means the application is potentially leaking a critical information to an external entity, a behavior that is probably not wanted by the user.
+
+
+Static analysis is powerful as it allows to detects unwanted behavior in an application even is the behavior does not manifest itself when running the application.
+Hovewer, static analysis tools must overcom many challenges when analysing Android applications.
+/ the Java object-oriented paradigm: A call to a method can in fact correspond to a call to any method overriding the original method in subclasses.
+/ the multiplicity of entry points: Each component of an application can be an entry point for the application.
+/ the event driven architecture: Methods of in the applications can be called when event occur, in unknown order.
+/ the interleaving of native code and bytecode: Native code can be called from bytecode and vice versa, but tools often only handle one of those format.
+/ the potential dynamic code loading: An application can run code that was not originally in the application.
+/ the use of reflection: Methods can be called from their name as a string object, which is difficult to identify statically.
+/ the continual evolution of Android: each new version of Android brings new features that an analysis tools must be aware of. 
+  For instance, the multi-dex feature presented in @sec:bg-android-code-format was introduced in Android #SDK 21.
+  Tools unaware of this feature only analyse the `classes.dex` file an will ignore all other `classes<n>.dex` files.
+
+#todo[Ca serait bien de souligner Dyn Code Load et Reflection]