Merge branch 'main' of git.mineau.eu:these-android-re/thesis
All checks were successful
/ test_checkout (push) Successful in 1m39s

This commit is contained in:
Jean-Marie Mineau 2025-10-01 15:52:11 +02:00
commit 4b0855b80e
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
4 changed files with 29 additions and 24 deletions

View file

@ -14,28 +14,26 @@ They analysed 92 publications and classified them by goal, method used to solve
In particular, they listed 27 approaches with an open-source implementation available.
Interestingly, a lot of the tools listed rely on common tools to interact with Android applications/#DEX bytecode.
Reccuring examples of such support tools are Apktool (#eg Amandroid~@weiAmandroidPreciseGeneral2014, Blueseal~@shenInformationFlowsPermission2014, SAAF~@hoffmannSlicingDroidsProgram2013), Androguard (#eg Adagio~@gasconStructuralDetectionAndroid2013, Appareciumn~@titzeAppareciumRevealingData2015, Mallodroid~@fahlWhyEveMallory2012) or Soot (#eg Blueseal~@shenInformationFlowsPermission2014, DroidSafe~@DBLPconfndssGordonKPGNR15, Flowdroid~@Arzt2014a).
Reccuring examples of such support tools are Apktool (#eg Amandroid~@weiAmandroidPreciseGeneral2014, Blueseal~@shenInformationFlowsPermission2014, SAAF~@hoffmannSlicingDroidsProgram2013), Androguard (#eg Adagio~@gasconStructuralDetectionAndroid2013, Appareciumn~@titzeAppareciumRevealingData2015, Mallodroid~@fahlWhyEveMallory2012) or Soot (#eg Blueseal~@shenInformationFlowsPermission2014, DroidSafe~@DBLPconfndssGordonKPGNR15, Flowdroid~@Arzt2014a): those tools are built incrementally, on top of each other.
This strengthens our idea that being able to reuse previous tools is important.
Those tools are built incrementally, on top of each other.
Nevertheless, experiments to evaluate the reusability of the pointed out software were not performed by Li #etal
Nevertheless, Li #etal focus more on the techniques and features described in the reviewed publications, and experiments to evaluate whether the pointed out software are still usable were not performed.
#jfl-note[We believe that the effort of reviewing the literature for making a comprehensive overview of available approaches should be pushed further: an existing published approach with a software that cannot be used for technical reasons endangers both the reproducibility and reusability of research.][A mettre en avant?]
//Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable tool being Flowdroid~@Arzt2014a.
We will now explore this direction further by looking at the work that has been done to evaluate different analysis tools.
Works that perform benchmarks of tools follow a similar method.
They start by selecting a set of tools with similar goals.
We will now explore this direction further by looking at other works that have been done to evaluate different analysis tools.
Those evaluations often take the form of benchmarks and follow a similar method (we will look at the different contributions in more detail in @sec:bg-bench).
They start by selecting a set of tools with similar goals to compare.
Usually, those contributions are comparing existing tools to their own, but some contributions do not introduce a new tool and focus on surveying the state of the art for some technique.
They then selected a dataset of applications to analyse.
We will see in @sec:bg-datasets that those datasets are often hand-crafted, except for some studies that select a few real-world applications that they manually reverse-engineered to get a ground truth to compare to the tool's result.
Once the tools and test dataset are selected, the tools are run on the application dataset, and the results of the tools are compared to the ground truth to determine the accuracy of each tool.
Several factors can be considered to compare the results of the tools:
the number of false positives, false negatives, or even the time it took to finish the analysis.
Once the tools and test dataset are selected, the tools are run on the application dataset, and the results of the tools are compared to the expected results (ground truth) to determine the accuracy of each tool.
Additional factors are sometimes compared as well: the number of false positives, false negatives, or even the time it took to finish the analysis.
Occasionally, the number of applications a tool simply failed to analyse is also compared.
In @sec:bg-datasets, we will look at the dataset used in the community to compare analysis tools.
Then in @sec:bg-bench> we will go through the contributions that benchmarked those tools #jm-note[to see if they can be used as an indication as to which tools can still be used today.][Mettre en avant]
Then, in @sec:bg-bench, we will go through the contributions that benchmarked those tools #jm-note[to see if they can be used as an indication as to which tools can still be used today.][Mettre en avant]
==== Application Datasets <sec:bg-datasets>
@ -57,12 +55,16 @@ These datasets are useful for carefully spotting missing taint flows, but contai
In addition to those datasets, AndroZoo~@allixAndroZooCollectingMillions2016 collect applications from several application marketplaces, including the Google Play store (the official Google application store), Anzhi and AppChina (two Chinese stores), or FDroid (a store dedicated to free and open source applications).
Currently, Androzoo contains more than 25 million applications that can be downloaded by researchers from the SHA256 hash of the application.
Androzoo also provide additional information about the applications, like the date the application was detected for the first time by Androzoo or the number of antiviruses from VirusTotal that flagged the application as malicious.
Androzoo also provides additional information about the applications, like the date the application was detected for the first time by Androzoo or the number of antiviruses from VirusTotal that flagged the application as malicious.
This will allow us to sample a dataset of applications evenly distributed over the years.
In addition to providing researchers with easy access to real-world applications, Androzoo make it a lot easier to share datasets for reproducibility: instead of sharing hundreds of #APK files, the list of SHA256 is enough.
==== Benchmarking <sec:bg-bench>
The few datasets composed of real-world applications confirmed that some tools, such as Amandroid~@weiAmandroidPreciseGeneral2014 and Flowdroid~@Arzt2014a, are less efficient on real-world applications~@bosuCollusiveDataLeak2017 @luoTaintBenchAutomaticRealworld2022.
We will now go through the different contributions that evaluated different static analysis tools to see if they can give us some insights into the current usability of the tools.
The few experiments with datasets composed of real-world applications confirmed that some tools, such as Amandroid~@weiAmandroidPreciseGeneral2014 and Flowdroid~@Arzt2014a, are less efficient on real-world applications~@bosuCollusiveDataLeak2017 @luoTaintBenchAutomaticRealworld2022.
Unfortunately, those real-world applications datasets are rather small, and a larger number of applications would be more suitable for our goal, #ie evaluating the reusability of a variety of static analysis tools.
Pauck #etal~@pauckAndroidTaintAnalysis2018 used DroidBench~@Arzt2014a, ICC-Bench~@weiAmandroidPreciseGeneral2014 and DIALDroid-Bench~@bosuCollusiveDataLeak2017 to compare Amandroid~@weiAmandroidPreciseGeneral2014, DIAL-Droid~@bosuCollusiveDataLeak2017, DidFail~@klieberAndroidTaintFlow2014, DroidSafe~@DBLPconfndssGordonKPGNR15, FlowDroid~@Arzt2014a and IccTA~@liIccTADetectingInterComponent2015. //-- all these tools will also be compared in this chapter.

View file

@ -1,4 +1,4 @@
#import "../lib.typ": SDK, API, API, DEX, pb2, pb2-text, etal, APIs
#import "../lib.typ": SDK, API, API, DEX, pb2, pb2-text, etal, APIs, ie
#import "../lib.typ": todo
=== Android Class Loading <sec:bg-soa-cl>
@ -6,9 +6,9 @@
#pb2-text
This subsection is mainly dedicated to class loading in Java and Android.
Because we focus on the _default_ class loading algorithm, we will not focus on dynamic code loading.
However, class loading is used to load classes other than the one in the application, without dynamic code loading.
In the second part of this subsection, we will look at the work that has been done related to those classes, the platform classes.
Because we focus on the _default_ class loading algorithm, we will not focus on dynamic code loading (#ie loading of additional bytecode while the application is already running).
However, class loading is used, without dynamic code loading, to load classes other than the one in the application: the platform classes.
In the second part of this subsection, we will look at the work that has been done related to those platform classes.
==== Class Loading <sec:bg-cl>
@ -45,15 +45,17 @@ Platform classes are divided between #SDK classes that are documented, and the o
#SDK classes are clearly listed and documented by Google, so they do not require as much attention as hidden #APIs.
As we said earlier, hidden #API are undocumented methods that can be used by an application, thus making them a potential blind spot when analysing an application.
However, not a lot of research has been done on the subject.
Li #etal did an empirical study of the usage and evolution of hidden #API~@li_accessing_2016.
They found that hidden #API are added and removed in every release of Android, and that they are used both by benign and malicious applications.
More recently, He #etal~@he_systematic_2023 did a systematic study of hidden service #API related to security.
They studied how the hidden #API can be used to bypass Android security restrictions and found that although Google countermeasures are effective, they need to be implemented inside the system services and not the hidden #API due to the lack of in-app privilege isolation: the framework code is in the same process as the user code, meaning any restriction in the framework can be bypassed by the user.
Unfortunately, those two contributions do not explore further the consequences of the use of hidden #APIs for a reverse engineer.
#v(2em)
Class loading mechanisms have been studied carefully in the context of the Java language.
In conclusion, class loading mechanisms have been studied carefully in the context of the Java language.
However, the same cannot be said about Android, whose implementation diverges significantly from classic Java Virtual Machines.
Most work done on Android focuses on extending Android capabilities using class loading, or on analysing dynamically the code loading operations of an application.

View file

@ -13,8 +13,9 @@ After that, we will also look at contributions that sought to encode results ins
Some situations, like reflection of dynamic code loading, are difficult to solve with static analysis and require a different approach: dynamic analysis.
With dynamic analysis, the application is actually executed, and the reverse engineer observes its behaviour.
Monitoring the behaviour can be achieved by various strategies: observing the filesystem, the display screen, the process memory, the kernel, ...
Depending on the chosen level of observation, it can be technically difficult.
Monitoring the behaviour can be achieved by various strategies: observing the filesystem, the display screen, the process memory, the kernel, etc.
Depending on the chosen level of observation, dynamic analysis can become a serious technical challenge.
A basic example of dynamic analysis is presented by Bernardi #etal~@bernardi_dynamic_2019: the logs generated by `strace` are used to list the system calls generated in response to an event to determine if an application is malicious or not.
More advanced methods are more intrusive and require modifying either the #APK, the Android framework, runtime, or kernel.
@ -49,7 +50,7 @@ For instance, StaDynA only provide the call graph, and cannot be used as is to i
This is unfortunate: the reverse engineer's next step will depend on the context.
Not being able to reuse the result of a previous analysis with any ad hoc tools greatly limits their options.
AppSpear has an interesting solution to this issue: the code it intercepts is repackaged inside a new #APK file that Android analysis tools should be able to analyse.
We will now explore further the contributions that take this approach of using actual applications to encode their results.
We will now explore further the contributions that take this approach to encode results inside applications.
//#todo[RealDroid sandbox bases on modified ART?]
//#todo[force execution?]

View file

@ -609,10 +609,10 @@ L'inspection du contenu montre que ces fichiers sont principalement des librairi
Seuls #num(nb_bytecode_collected - nb_google - nb_appsflyer - nb_facebook) fichiers parmi les #nb_bytecode_collected collectés ne proviennent ni de Google, ni de Facebook, ni de AppsFlyer.
Ces fichiers restants contiennent du code spécifique aux applications les utilisant, principalement des applications exigeant un niveau important de sécurité comme des applications banquaires ou d'assurance santé.
La @tab:th-comparaison-graph-appel montre le nombre d'arc du graph d'appel de fonction de ces quelques applications qui charge du code dynamiquement spécifique à leurs usage.
La colonne "Réflection Ajoutées" correspond au nombre d'appels reflectives ajouté a l'applications.
Les autres arcs ajoutés sont soit des fonctions "colle" que nous avont ajouté a l'application pour choisir la bonne méthode à appeler reflectivement, soit des méthodes appelé par du code chargé dynamiquement auquel Androguard n'avais pas accès avant l'instrumentation.
On peut voir que notre méthode permet effectivement à Androguard de calculer un plus grand graphe.
La @tab:th-comparaison-graph-appel montre le nombre d'arcs du graphe d'appel de fonction de ces quelques applications qui chargent du code spécifique à leurs usage dynamiquement.
La colonne "Réflection Ajoutées" correspond au nombre d'appels reflectifs ajoutés a l'application.
Les autres arcs ajoutés sont soit des fonctions "colle" que nous avont ajoutées a l'application pour choisir la bonne méthode à appeler reflectivement, soit des méthodes appelées par du code chargé dynamiquement auquel Androguard n'avait pas accès avant l'instrumentation.
On peut voir que notre méthode permet à Androguard de calculer un plus grand graphe.
#figure({
let nb_col = 5