wip
All checks were successful
/ test_checkout (push) Successful in 1m1s

This commit is contained in:
Jean-Marie Mineau 2025-07-21 22:00:29 +02:00
parent fd4d6fa239
commit ea82a3ca8b
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
10 changed files with 119 additions and 98 deletions

View file

@ -57,12 +57,10 @@ In addition to decompilling #DEX files, Jadx can also decode Android manifests a
=== Soot <sec:bg-soot>
#todo[soot ref]
Soot#footnote[https://github.com/soot-oss/soot] is a Java optimization framework.
Soot#footnote[https://github.com/soot-oss/soot] @Arzt2013 is a Java optimization framework.
It can leaft java bytecode to other intermediate representations that can be used to perform optimization then converted back to bytecode.
Because Dalvik bytecode and Java bytecode are equivalent, support for Android was added to Soot, and Soot features are now leveraged to analyse Android applications.
One of the best known example of Soot usage for Android analysis is Flowdroid #todo[ref], a tool that compute data flow in an application.
One of the best known example of Soot usage for Android analysis is Flowdroid@Arzt2014a, a tool that compute data flow in an application.
A new version of Soot, SootUp#footnote[https://github.com/soot-oss/SootUp], is currently beeing worked on.
Compared to Soot, it has a modernize interface and architecture, but it is not yet feature complete and some tools like Flowdroid are still using Soot.

View file

@ -0,0 +1,41 @@
#import "../lib.typ": todo, APK
== Android Reverse Engineering Techniques <sec:bg-techniques>
#todo[swap with tool section ?]
In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications.
Two main approaches can be distinguished: static and dynamic analysis@Li2017.
Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system.
For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task.
Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage.
For malware, dynamic analysis is also limited by evading techniques that may prevent the execution of malicious parts of the code.
//As a consequence, a lot of efforts have been put in static approaches, which is the focus of this paper.
=== Static Analysis <sec:bg-static>
Static analysis tools are used to perform operations on an #APK file, like extracting its bytecode or information from the `AndroidManifest.xml` file.
#todo[Explain controle flow graph, data flow graph, and link to tools?]
A classic goal of a static analysis is to compute data flows to detect potential information leaks@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015 by analyzing the bytecode of an Android application.
Static analysis tools for Android application must overcom many difficulties:
/ the multiplicity of entry points: Each component of an application can be an entry point for the application
/ the event driven architecture: Methods of in the applications can be called in many different order depending on external events
/ the interleaving of native code and bytecode: Native code can be called from bytecode and vice versa, but tools often only handle one of those format
/ the potential dynamic code loading: And application can run code that was not orriginally in the application
/ the use of reflection: Methods can be called from their name as a string object, which is not necessary known statically
/ the continual evolution of Android: each new version brings new features that an analysis tools must be aware of
The tools can share the backend used to interact with the bytecode.
For example, Apktool is often called in a subprocess to extracte the bytecode.
Another example is Soot@Arzt2013, a Java framework that allows to manipulate the bytecode from an object representation of instructions.
The most known tool built on top of Soot is FlowDroid@Arzt2014a, which enables to compute information flows statically into the code.
=== Dynamic Analysis <sec:bg-dynamic>
#todo[y a du boulot]
=== Hybrid Analysis <sec:bg-hybrid>

View file

@ -0,0 +1,23 @@
#import "../lib.typ": todo, etal, APK
== Application Datasets <sec:bg-datasets>
Computing if an application contains a possible information flow is an example of a static analysis goal.
Some datasets have been built especially for evaluating tools that are computing information flows inside Android applications.
One of the first well known dataset is DroidBench, that was released with the tool Flowdroid@Arzt2014a.
Later, the dataset ICC-Bench was introduced with the tool Amandroid@weiAmandroidPreciseGeneral2014 to complement DroidBench by introducing applications using Inter-Component data flows.
These datasets contain carefully crafted applications containing flows that the tools should be able to detect.
These hand-crafted applications can also be used for testing purposes or to detect any regression when the software code evolves.
Contrary to real world applications, the behavior of these hand-crafted applications is known in advance, thus providing the ground truth that the tools try to compute.
However, these datasets are not representative of real-world applications@Pendlebury2018 and the obtained results can be misleading.
Contrary to DroidBench and ICC-Bench, some approaches use real-world applications.
Bosu #etal@bosuCollusiveDataLeak2017 use DIALDroid to perform a threat analysis of Inter-Application communication and published DIALDroid-Bench, an associated dataset.
Similarly, Luo #etal released TaintBench@luoTaintBenchAutomaticRealworld2022 a real-world dataset and the associated recommendations to build such a dataset.
These datasets are useful for carefully spotting missing taint flows, but contain only a few dozen of applications.
In addition to those datasets, Androzoo@allixAndroZooCollectingMillions2016 collect applications from several application market places, including the Google Play store (the official Google application store), Anzhi and AppChina (two chinese stores), or FDroid (a store dedicated to free and open source applications).
Currently, Androzoo contains more than 25 millions applications, that can be downloaded by researchers from the SHA256 hash of the application.
Androzoo provide additionnal information about the applications, like the date the application was detected for the first time by Androzoo or the number of antivirus from VirusTotal that flaged the application as malicious.
In addition to providing researchers with an easy access to real world applications, Androzoo make it a lot easier to share datasets for reproducibility: instead of sharing hundreds of #APK files, the list of SHA256 is enough.

View file

@ -4,10 +4,10 @@
#epigraph("Alexis \"Lex\" Murphy, Jurassic Park")[This is a Unix system. I know this.]
#todo[Present field background and related work]
#include("X_android.typ")
#include("X_tools.typ")
#include("1_android.typ")
#include("2_tools.typ")
#include("3_analysis_techniques.typ")
#include("4_datasets.typ")
/*
* Cours generique sur android