thesis/2_background/2_3_static_analysis.typ

#import "../lib.typ": APK, etal, ART, SDK, DEX, eg,
#import "../lib.typ": todo, jm-note, jfl-note
#import "@preview/diagraph:0.3.5": raw-render

=== Static Analysis <sec:bg-static>

Static analysis program examine an #APK file without executing it to extract information from it.
Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code with tools like Apktool or Jadx.
Unfortunately, simply reading the bytecode does not scale.
To do so, a human analyst is needed, making it complicated to analyse a large number of applications, and even for single applications, the size and complexity of some applications can quickly overwhelm the reverse engineer.

Control flow analysis is often used to mitigate this issue.
The idea is to extract the behaviour, the flow, of the application from the bytecode, and to represent it as a graph.
A graph representation is easier to work with than a list of instructions, and can be used for further analysis.
Depending on the level of precision required, different types of graphs can be computed.
The most basic of those graph is the call graph.
A call graph is a graph where the nodes represent the methods in the application, and the edges reprensent calls from one method to another.
@fig:bg-fizzbuzz-cg-cfg b) show the call graph of the code in @fig:bg-fizzbuzz-cg-cfg a).
A more advance control-flow analysis consist in building the control-flow graph.
This time, instead of methods, the nodes represent instructions, and the edges indicate which instruction can follow which instruction.
@fig:bg-fizzbuzz-cg-cfg c) represents the control-flow graph of @fig:bg-fizzbuzz-cg-cfg a), with code statement instead of bytecode instructions.

#todo[Add alt text for @fig:bg-fizzbuzz-cg and @fig:bg-fizzbuzz-cfg]

#figure({
  set align(center)
  stack(dir: ttb,[
  #figure(
    ```java
    public static void fizzBuzz(int n) {
      for (int i = 1; i <= n; i++) {
        if (i % 3 == 0 && i % 5 == 0) {
          Buzzer.fizzBuzz();
        } else if (i % 3 == 0) {
          Buzzer.fizz();
        } else if (i % 5 == 0) {
          Buzzer.buzz();
        } else {
          Log.e("fizzbuzz", String.valueOf(i));
        }
      }
    }
    ```,
    supplement: none,
    kind: "bg-fizzbuzz-cg-cfg subfig",
    caption: [a) A Java program],
  ) <fig:bg-fizzbuzz-java>], v(2em), stack(dir: ltr, [
  #figure(
    raw-render(```
      digraph {
        rankdir=LR
        "fizzBuzz(int)" -> "Buzzer.fizzBuzz()"
        "fizzBuzz(int)" -> "Buzzer.fizz()"
        "fizzBuzz(int)" -> "Buzzer.buzz()"
        "fizzBuzz(int)" -> "String.valueOf(int)"
        "fizzBuzz(int)" -> "Log.e(String, String)"
      }
      ```,
      width: 40%,
      alt: "",
    ),
    supplement: none,
    kind: "bg-fizzbuzz-cg-cfg subfig",
    caption: [b) Corresponding Call Graph]
  ) <fig:bg-fizzbuzz-cg>],[
  #figure(
    raw-render(```
      digraph {
        l1
        l2
        l3
        l4
        l5
        l6
        l7
        l9

        l1 -> l2
        l2 -> l3
        l3 -> l1
        l2 -> l4
        l4 -> l5
        l5 -> l1
        l4 -> l6
        l6 -> l7
        l7 -> l1
        l6 -> l9
        l9 -> l1
      }
      ```,
      labels: (
        "l1": `for (int i = 1; i <= n; i++) {`,
        "l2": `if (i % 3 == 0 && i % 5 == 0) {`,
        "l3": `Buzzer.fizzBuzz();`,
        "l4": `} else if (i % 3 == 0) {`,
        "l5": `Buzzer.fizz();`,
        "l6": `} else if (i % 5 == 0) {`,
        "l7": `Buzzer.buzz();`,
        "l9": `Log.e("fizzbuzz", String.valueOf(i));`,
      ),
      width: 50%,
      alt: "",
    ),
    supplement: none,
    kind: "bg-fizzbuzz-cg-cfg subfig",
    caption: [c) Corresponding Control-Flow Graph]
  ) <fig:bg-fizzbuzz-cfg>]))
  h(1em)},
  supplement: [Figure],
  caption: [Source code for a simple Java method and its Call and Control Flow Graphs],
)<fig:bg-fizzbuzz-cg-cfg>

Once the control-flow graph is computed, it can be used to compute data-flows.
Data-flow analysis, also called taint-tracking, allows to follow the flow of information in the application.
Be defining a list of methods and fields that can generate critical information (taint sources) and a list of methods that can consume information (taint sink), taint-tracking allows to detect potential data leaks (if a data flow link a taint source and a taint sink).
For example, `TelephonyManager.getImei()` returns an unique, persistent, device identifier.
This can be used to identify the user, and it cannot be changed if compromised.
This make `TelephonyManager.getImei()` a good candidate as a taint source.
On the other hand, `UrlRequest.start()` send a request to an external server, making it a taint sink.
If a data-flow is found linking `TelephonyManager.getImei()` to `UrlRequest.start()`, this means the application is potentially leaking a critical information to an external entity, a behavior that is probably not wanted by the user.


Static analysis is powerful as it allows to detects unwanted behavior in an application even is the behavior does not manifest itself when running the application.
Hovewer, static analysis tools must overcom many challenges when analysing Android applications.
/ the Java object-oriented paradigm: A call to a method can in fact correspond to a call to any method overriding the original method in subclasses.
/ the multiplicity of entry points: Each component of an application can be an entry point for the application.
/ the event driven architecture: Methods of in the applications can be called when event occur, in unknown order.
/ the interleaving of native code and bytecode: Native code can be called from bytecode and vice versa, but tools often only handle one of those format.
/ the potential dynamic code loading: An application can run code that was not originally in the application.
/ the use of reflection: Methods can be called from their name as a string object, which is difficult to identify statically.
/ the continual evolution of Android: each new version of Android brings new features that an analysis tools must be aware of.
  For instance, the multi-dex feature presented in @sec:bg-android-code-format was introduced in Android #SDK 21.
  Tools unaware of this feature only analyse the `classes.dex` file an will ignore all other `classes<n>.dex` files.

#todo[Ca serait bien de souligner Dyn Code Load et Reflection]