I declare this manuscript finished
All checks were successful
/ test_checkout (push) Successful in 1m48s

This commit is contained in:
Jean-Marie Mineau 2025-10-07 17:16:32 +02:00
parent 9f39ded209
commit 5c3a6955bd
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
14 changed files with 162 additions and 131 deletions

View file

@ -4,23 +4,6 @@
=== Static Analysis <sec:bg-static>
A static analysis program examines an #APK file without executing it to extract information from it.
Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code with tools like Apktool or Jadx.
Unfortunately, simply reading the bytecode does not scale.
To do so, a human analyst is needed, making it complicated to analyse a large number of applications, and even for single applications, the size and complexity of some applications can quickly overwhelm the reverse engineer.
Control flow analysis is often used to mitigate this issue.
The idea is to extract the behaviour, the flow, of the application from the bytecode, and to represent it as a graph.
A graph representation is easier to work with than a list of instructions and can be used for further analysis.
Depending on the level of precision required, different types of graphs can be computed.
The most basic of those graphs is the call graph.
A call graph is a graph where the nodes represent the methods in the application, and the edges represent calls from one method to another.
@fig:bg-fizzbuzz-cg-cfg b) show the call graph of the code in @fig:bg-fizzbuzz-cg-cfg a).
A more advanced control-flow analysis consists of building the control-flow graph.
This time, instead of methods, the nodes represent instructions, and the edges indicate which instruction can follow which instruction.
@fig:bg-fizzbuzz-cg-cfg c) represents the control-flow graph of @fig:bg-fizzbuzz-cg-cfg a), with code statements instead of bytecode instructions.
#figure({
set align(center)
stack(dir: ttb,[
@ -119,6 +102,22 @@ This time, instead of methods, the nodes represent instructions, and the edges i
caption: [Source code for a simple Java method and its Call and Control Flow Graphs],
)<fig:bg-fizzbuzz-cg-cfg>
A static analysis program examines an #APK file without executing it to extract information from it.
Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code with tools like Apktool or Jadx.
Unfortunately, simply reading the bytecode does not scale.
To do so, a human analyst is needed, making it complicated to analyse a large number of applications, and even for single applications, the size and complexity of some applications can quickly overwhelm the reverse engineer.
Control flow analysis is often used to mitigate this issue.
The idea is to extract the behaviour, the flow, of the application from the bytecode, and to represent it as a graph.
A graph representation is easier to work with than a list of instructions and can be used for further analysis.
Depending on the level of precision required, different types of graphs can be computed.
The most basic of those graphs is the call graph.
A call graph is a graph where the nodes represent the methods in the application, and the edges represent calls from one method to another.
@fig:bg-fizzbuzz-cg-cfg b) show the call graph of the code in @fig:bg-fizzbuzz-cg-cfg a).
A more advanced control-flow analysis consists of building the control-flow graph.
This time, instead of methods, the nodes represent instructions, and the edges indicate which instruction can follow which instruction.
@fig:bg-fizzbuzz-cg-cfg c) represents the control-flow graph of @fig:bg-fizzbuzz-cg-cfg a), with code statements instead of bytecode instructions.
Once the control-flow graph is computed, it can be used to compute data-flows.
Data-flow analysis, also called taint-tracking, is used to follow the flow of information in the application.
By defining a list of methods and fields that can generate critical information (taint sources) and a list of methods that can consume information (taint sinks), taint-tracking detects potential data leaks (if a data flow links a taint source and a taint sink).