Compare commits

...

2 commits

Author SHA1 Message Date
d7df45b206
pass chapter 5
Some checks failed
/ test_checkout (push) Failing after 21s
2025-09-30 03:05:07 +02:00
f309dd55b8
small update of the french summary 2025-09-29 23:00:42 +02:00
9 changed files with 104 additions and 67 deletions

View file

@ -29,13 +29,13 @@ Par exemple, Flowdroid a pour objectif de détecter les fuites d'informations: l
Flowdroid va alors calculer s'il existe des chemins dans l'application permettant de relier des méthodes de la première catégorie avec des méthodes de la seconde.
Malheureusement, ces outils sont difficiles à utiliser, et même s'ils fonctionnent sur des applications simples construites dans le but de tester les outils, il n'est pas rare que ces outils échouent sur de vraies applications.
Cela pose la problématique #pb1-text-fr
Cela pose la problématique suivante: #pb1-text-fr
Il y a deux familles d'analyse: l'analyse statique et l'analyse dynamique.
L'analyse statique analyse l'application sans la lancer, alors que l'analyse dynamique examine le comportement de l'application pendant son exécution.
Chacune a ses forces et ses faiblesses, et certains problèmes d'analyse sont traditionnellement associés à l'une ou l'autre pour les résoudre.
L'un de ces problème est le chargement dynamique de code.
Les applications Android sont initiallement prévus pour être codé en Java, et donc Android a hérité de beaucoup de fonctionnalités de Java.
Les applications Android sont initiallement prévues pour être codées en Java, Android a donc a hérité de nombreuses fonctionnalités de Java.
En l'occurence, Android a un système de chargeur de classes similaire à celui de Java, qui peut être utilisé pour charger, en cour d'exécution, du code extérieur à l'application.
Etant donné que ce code chargé dynamiquement n'est pas nécessairement disponible dans l'application initialement, ce problème est relégé à l'analyse dynamique.
Toutefois, il semblerait qu'une généralisation hâtive soit souvent faite, et que le système de chargement de classe dans son ensemble soit relégé à l'analyse dynamique.
@ -51,9 +51,6 @@ Il n'existe pas de solution standard pour transmettre ces données aux outils d'
Certaines contributions d'ingénierie inverse ont déjà proposé d'instrumenter (modifier) l'application pour y ajouter les résultats de leur analyse avant de l'analyser avec d'autres outils.
Cette approche prometteuse motive notre troisième problématique: #pb3-text-fr
#todo[Bouger le résumé a la fin fr à la fin?]
#[
== Evaluation de la réutilisabilité des outils d'analyse statique pour Android
@ -67,7 +64,7 @@ Au contraire, dans ce chapitre nous allons considérer comme correct tout résul
Les questions auxquelles nous voulons répondre sont:
/ QR1: Quels outils d'analyse statique pour Android vieux de plus de 5 ans peuvent encore être utilisé aujourd'hui avec un effort raisonnable?
/ QR1: Quels outils d'analyse statique pour Android vieux de plus de 5 ans peuvent encore être utilisés aujourd'hui avec un effort raisonnable?
/ QR2: Comment la réutilisabilité des outils évolue-t-elle avec le temps, en particulier pour l'analyse d'applications publiées avec plus de 5 ans d'écart avec l'outil?
/ QR3: Est-ce que la réutilisabilité des outils change quand on analyse une application bénigne comparé à un maliciel?
@ -152,7 +149,7 @@ Nous avons éliminé les outils utilisant de l'analyse dynamique en plus de l'an
Nous avons ensuite sélectionné la version des outils à utiliser.
Certains outils ont évolué depuis leur publication, soit en étant maintenus par leurs auteurs, soit suite à un branchement par un autre développeur.
Nous avons décidé d'utiliser la dernière version stable en date de 2023 (date de l'étude).
Le seul cas de branchement interescant que nous avons trouvé est celui d'IC3, que nous avons décidé d'inclure en plus d'IC3.
Le seul cas de branchement intéressant que nous avons trouvé est celui d'IC3, que nous avons décidé d'inclure en plus de la version originale.
Le @tab:rasta-choix-sources résume cette étape.
#figure({
@ -334,7 +331,7 @@ Nous avons donc une réponse à notre *RQ3*: Les maliciels causent moins d'erreu
Finalement, nous avons une réponse à notre première problématique:
Plus de la moitié des outils sélectionnés ne sont plus utilisables.
Dans certains cas, cela est à notre incapicité à les installer correctement, mais majoritairement, cela est au faible taux de finition des outils lors de l'analyse des applications.
Dans certains cas, cela est à notre incapacité à les installer correctement, mais majoritairement, cela est au faible taux de finition des outils lors de l'analyse des applications.
Nos résultats montrent que les applications avec beaucoup de code sont plus difficiles à analyser, et, en moindre mesure, la version d'Android ainsi que la malignité de l'application peut avoir un impact.
] #[
@ -346,7 +343,7 @@ Nos résultats montrent que les applications avec beaucoup de code sont plus dif
Dans ce chapitre, nous étudions comment Android gère le chargement de classe en présence de mutliples versions de la même classe.
Nous modélisons l'algorithme de chargement de classe d'Android, et l'utilisons comme base pour une nouvelle famille de brouillage de code que nous appelons _masquage de classes_.
Nous auditons ensuite des applications publiés en 2023 pour déterminer si cette technique de brouillage est actuellement utilisée.
Nous analysons ensuite des applications publiés en 2023 pour déterminer si cette technique de brouillage est actuellement utilisée.
Le chargement de classe est une fonctionnalité de Java dont Android a hérité.
Les développeurs intéragissent avec elle le plus souvent au travers de classes héritant de `ClassLoader` pour charger dynamiquement du code.
@ -440,7 +437,7 @@ Nous proposons trois techniques dans cette catégorie:
/ Masquage d'IPA Cachée: L'idée est la même que pour la technique précédente, mais cette fois pour une classe de l'IPA caché.
Nous distinguons masquage de KDL et masquage d'IPA caché car les IPA cachés n'étant pas documentés, il est possible que des outils soient capables de résoudre la première technique mais pas la deuxième.
Nous avons vérifié l'effet de ses techniques sur 4 outils d'analyse Android courants: Jadx, Apktool, Androguard et Flowdroid.
Nous avons vérifié l'effet de ses techniques sur 4 outils de rétro-ingénierie Android courants: Jadx, Apktool, Androguard et Flowdroid.
Le @tab:cl-resultats résume nos conclusions.
Jadx est un décompilateur d'application.
Lorsqu'il est utilisé pour décompiler une application usant d'auto-masquage, il va sélectionner la mauvaise classe, mais indiquer en commentaire la liste des fichiers de code à octet contenant une implementation de la classe.
@ -579,7 +576,7 @@ Les objects `Class`, `Constructor` ou `Method` utilisés pour appeler ces métho
Nous n'allons donc pas chercher à modifier le code obtenant ces objets.
#à-maj la place, nous allons nous concentrer sur l'appel des méthodes.
#à-maj différents moments de l'exécution, un même site peut appeler différentes méthodes.
De plus, la collection des informations de réflexion sera toujours au meilleur effort: il y a des situations on ne peut jamais être certain d'avoir la liste complète des méthodes appelées.
De plus, la collection des informations de réflexion ne sera jamais parfaite: il y a des situations on ne peut jamais être certain d'avoir la liste complète des méthodes appelées.
Par exemple, on peut imaginer une application qui appelle par réflexion une méthode dont le nom est obtenu depuis un serveur distant.
Dans ce cas, sans accès au code du serveur il est impossible d'avoir la liste exhaustive des méthodes qui peuvent être utilisées.
Pour prendre en compte ces deux cas, nous allons remplacer les appels par des blocs conditionnels.
@ -612,6 +609,38 @@ L'inspection du contenu montre que ces fichiers sont principalement des librairi
Seuls #num(nb_bytecode_collected - nb_google - nb_appsflyer - nb_facebook) fichiers parmi les #nb_bytecode_collected collectés ne proviennent ni de Google, ni de Facebook, ni de AppsFlyer.
Ces fichiers restants contiennent du code spécifique aux applications les utilisant, principalement des applications exigeant un niveau important de sécurité comme des applications banquaires ou d'assurance santé.
La @tab:th-comparaison-graph-appel montre le nombre d'arc du graph d'appel de fonction de ces quelques applications qui charge du code dynamiquement spécifique à leurs usage.
La colonne "Réflection Ajoutées" correspond au nombre d'appels reflectives ajouté a l'applications.
Les autres arcs ajoutés sont soit des fonctions "colle" que nous avont ajouté a l'application pour choisir la bonne méthode à appeler reflectivement, soit des méthodes appelé par du code chargé dynamiquement auquel Androguard n'avais pas accès avant l'instrumentation.
On peut voir que notre méthode permet effectivement à Androguard de calculer un plus grand graphe.
#figure({
let nb_col = 5
table(
columns: (2fr, 1fr, 1fr, 1fr, 2fr),
align: center+horizon,
stroke: none,
table.hline(),
table.header(
//[SHA 256], [Original CG edges], [New CG edges], [Edges added], [Reflection edges added],
table.cell(rowspan: 2)[#APK SHA 256], table.cell(colspan: nb_col - 1)[Nombre d'arcs du graphe d'appel], [Avant], [Après], [Différences], [Réflection Ajoutées],
),
table.hline(),
..compared_callgraph.map(
//(e) => ([#lower(e.sha256).slice(0, 10)...], num(e.edges_before), num(e.edges_after), num(e.added), num(e.added_ref_only))
(e) => (
[#lower(e.sha256).slice(0, 10)...],
text(fill: luma(75), num(e.edges_before)),
text(fill: luma(75), num(e.edges_after)),
num(e.added),
num(e.added_ref_only)
)).flatten(),
//[#lower("5D2CD1D10ABE9B1E8D93C4C339A6B4E3D75895DE1FC49E248248B5F0B05EF1CE").slice(0, 10)...], table.cell(colspan: nb_col - 1)[_Instrumentation Crashed_],
table.hline(),
)},
caption: [Arcs ajoutés aux graphes d'appel de fonctions des applications instrumentées, calculé par Androguard]
) <tab:th-comparaison-graph-appel>
Nous avons ensuite modifié les applications comme décrit précédemment, puis relancé les outils de notre première contribution sur les applications modifiées pour comparer leur taux de finition au taux sur les applications initiales.
En fonction des outils, le taux de finition est soit inchangé, soit légèrement plus faible pour les applications modifiées.

View file

@ -2,12 +2,10 @@
== Introduction <sec:th-intro>
#todo[Reflectif call? Reflection call?]
In the previous chapter, we studied the static impact of class loaders.
Doing so, we ignored the main usage of class loaders by developers: dynamic code loading.
In this chapter, we tackle this issue, as well as the issue of reflection that often comes with dynamic code loading.
Dynamic code loading is the practice of loading at runtime bytecode that was not already part of the original bytecode of the association.
However, as we focused on the default behaviour of Android, we ignored the main use of class loaders for developers: dynamic code loading.
In this chapter, we address this issue, as well as the issue of reflection that often accompanies dynamic code loading.
Dynamic code loading is the practice of loading at runtime bytecode that was not already part of the original bytecode of the application.
This bytecode can be stored as assets of the application, downloaded from a remote server, or even generated algorithmically by the application.
This is a problem for analysis: when the bytecode is not already visible in the application, it cannot be analysed statically.
Meanwhile, reflection is the action of using code to manipulate objects representing structures of the code itself, like classes or methods.
@ -19,8 +17,10 @@ For such cases, dynamic analysis is a more appropriate approach.
It can be used to collect the missing information while the application is running.
However, having this information does not mean that the application can now be analysed in its entirety.
Generic analysis tools rarely have an easy way to read additional information about an application before analysing, and when they do, it is not standard.
The usual approach for hybrid analysis, analyses that mix static and dynamic analysis, is to select one specific static tool and modify its code to take into account the additional data collected by dynamic analysis.
The usual approach for hybrid analysis (analyses that mix static and dynamic analysis) is to select one specific static tool and modify its code to take into account the additional data collected by dynamic analysis.
This limits the reverse engineer to a few tools that they took the time to study and modify for the task.
In this chapter, we propose to modify the code of the application to add the information needed for analysis in a format that any analysis tool can use.
This way, the analyst is no longer limited in their choice of tool and can focus on the actual analysis of the application.
We structured this chapter as follows: We first present an overview of our method in @sec:th-overview.
We then present the transformations we apply to the application in @sec:th-trans and the dynamic analysis we perform in @sec:th-dyn.

View file

@ -4,10 +4,11 @@
== Overview <sec:th-overview>
Our objective is to make available some dynamic information to any analysis tool able to analyse an Android #APK.
To do so, we elected to follow the path of a few contributions we presented in @sec:bg, such as DroidRA~@li_droidra_2016, and use instrumentation.
Contrary to DroidRA, which uses static analysis to compute the values of strings and, from that, the methods used by reflection, we chose to use dynamic analysis.
To do so, we elected to follow the same approach as a few contributions we presented in @sec:bg, such as DroidRA~@li_droidra_2016, and use instrumentation.
As a reminder, DroidRA is a tool that uses COAL to compute reflection data statically, then instruments the application to directly call the methods.
Contrary to DroidRA, we chose to use dynamic analysis.
This allows us to collect information that is simply not available statically (#eg a string sent from a remote command and control server).
The tradeoff being the lack of exhaustiveness: dynamic analysis is known to have code coverage issues.
The tradeoff here is the lack of exhaustiveness: dynamic analysis is known to have code coverage issues.
#figure(
raw-render(

View file

@ -6,7 +6,6 @@ In this section, we will see how we can transform the application code to make d
=== Transforming Reflection <sec:th-trans-ref>
In Android, reflection allows applications to instantiate a class or call a method without having this class or method appear in the bytecode.
Instead, the bytecode uses the generic classes `Class`, `Method` and `Constructor`, which represent any existing class, method or constructor.
Reflection often starts by retrieving the `Class` object representing the class to use.
@ -17,6 +16,7 @@ Similarly, to call a method, the `Method` object must be retrieved, then called
Although the process seems to differ between class instantiation and method call from the Java standpoint, the runtime operations are very similar.
When instantiating an object with `Object obj = cst.newInstance("Hello Void")`, the constructor method `<init>(Ljava/lang/String;)V`, represented by the `Constructor` `cst`, is called on the object `obj`.
Thus, even for instantiation, a method is called at some point.
#figure(
```java
@ -44,11 +44,10 @@ When instantiating an object with `Object obj = cst.newInstance("Hello Void")`,
caption: [Calling a method using reflection]
) <lst:-th-expl-cl-call>
One of the main reasons to use reflection is to access classes that are not present in the application bytecode, nor are platform classes.
Indeed, the application will crash if the #ART encounters references to a class that cannot be found by the current class loader.
This is often the case when dealing with classes from bytecode loaded dynamically.
One of the main reasons to use reflection is to access classes that are neither platform classes nor in the application bytecode, as is often the case when dealing with classes from dynamically loaded bytecode.
Indeed, if the #ART were to encounter an instruction referencing a class that cannot be loaded by the current class loaded, it would crash the application.
To allow static analysis tools to analyse an application that uses reflection, we want to replace the reflection call with the bytecode that actually calls the method.
To allow static analysis tools to analyse an application that uses reflection, we want to replace the reflection call with a bytecode chunk that actually calls the method and can be analysed by any static analysis tool.
In @sec:th-trans-cl, we deal with the issue of dynamic code loading so that the classes used are, in fact, present in the application.
A notable issue is that a specific reflection call can call different methods.
@ -65,11 +64,12 @@ In addition, the method we propose in @sec:th-dyn is a best effort approach to c
caption: [A reflection call that can call any method]
) <lst:th-worst-case-ref>
To handle those situations, instead of entirely removing the reflection call, we can modify the application code to test if the `Method` (or `Constructor`) object matches any expected method, and if yes, directly call the method.
To handle those situations, instead of entirely removing the reflection call, we can modify the application code to test if the `Method` (or `Constructor`) object matches any of the methods observed dynamically, and if so, directly call the method.
If the object does not match any expected method, the code can fall back to the original reflection call.
DroidRA~@li_droidra_2016 has a similar solution, except that reflective calls are always evaluated, and the static equivalent follows just after, guarded behind an opaque predicate that is always false at runtime.
@lst:-th-expl-cl-call-trans demonstrate this transformation on @lst:-th-expl-cl-call:
at line 25, the `Method` object `mth` is checked using a method we generated and injected in the application (defined at line 2 in the listing).
@lst:-th-expl-cl-call-trans demonstrates this transformation for the code originally in @lst:-th-expl-cl-call.
Let's suppose that we observed dynamically a call to a method `Reflectee.myMethod(String)` at line 3 when monitoring the execution of the code of @lst:-th-expl-cl-call.
In @lst:-th-expl-cl-call, at line 25, the `Method` object `mth` is checked using a method we generated and injected in the application (defined at line 2 in the listing).
This method checks if the method name (line 5), its parameters (lines 6-9), its return type (lines 10-11) and its declaring class (lines 13-14) match the expected method.
If it is the case, the method is used directly (line 26) after casting the arguments and associated object into the types/classes we just checked.
If the check line 25 does not pass, the original reflective call is made (line 28).
@ -82,12 +82,12 @@ If we were to expect other possible methods to be called in addition to `myMetho
] #todo[Ref to list of common tools?] reformated for readability.
*/
The method check is done in a separate method injected inside the application to avoid cluttering the application too much.
The check of the `Method` value is done in a separate method injected inside the application to avoid cluttering the application too much.
Because Java (and thus Android) uses polymorphic methods, we cannot just check the method name and its class, but also the whole method signature.
We chose to limit the transformation to the specific instruction that calls `Method.invoke(..)`.
This drastically reduces the risks of breaking the application, but leads to a lot of type casting.
Indeed, the reflection call uses the generic `Object` class, but actual methods usually use specific classes (#eg `String`, `Context`, `Reflectee`) or scalar types (#eg `int`, `long`, `boolean`).
This means that the method parameters and object on which the method is called must be downcast to their actual type before calling the method, then the returned value must be upcast back to an `Object`.
This means that the method parameters and object on which the method is called must be downcasted to their actual type before calling the method, then the returned value must be upcasted back to an `Object`.
Scalar types especially require special attention.
Java (and Android) distinguish between scalar types and classes, and they cannot be mixed: a scalar cannot be cast into an `Object`.
However, each scalar type has an associated class that can be used when doing reflection.
@ -137,7 +137,6 @@ In those cases, the parameters could be used directly without the detour inside
caption: [@lst:-th-expl-cl-call after the de-reflection transformation]
) <lst:-th-expl-cl-call-trans>
=== Transforming Code Loading (or Not) <sec:th-trans-cl>
#jfl-note[Ici je pensais lire comment on tranforme le code qui load du code, mais on me parle de multi dex]
@ -150,7 +149,7 @@ This means that we only need to find a way to integrate #DEX files into the appl
We saw in @sec:cl the class loading model of Android.
When doing dynamic code loading, an application defines a new `ClassLoader` that handles the new bytecode, and starts accessing its classes using reflection.
We also saw in @sec:cl that Android now use the multi-dex format, allowing it to handle any number of #DEX files in one class loader.
Therefore, the simpler way to give access to the dynamically loaded code to static analysis tools is to add the dex files to the application.
Therefore, the simpler way to give access to the dynamically loaded code to static analysis tools is to add the dex files in the application as additional multi-dex bytecode files.
This should not impact the class loading model as long as there is no class collision (we will explore this in @sec:th-class-collision) and as long as the original application did not try to access inaccessible classes (we will develop this issue in @sec:th-limits).
#figure(
@ -162,12 +161,15 @@ This should not impact the class loading model as long as there is no class coll
caption: [Inserting #DEX files inside an #APK]
) <fig:th-inserting-dex>
In the end, we decided to *not* modify the original code that loads the bytecode.
We already added the bytecode loaded dynamically, and most tools already ignore dynamic code loading.
In the end, we decided *not* to modify the original code that loads the bytecode.
Most tools already ignore dynamic code loading, and, with the dynamically loaded bytecode added using the multi-dex format, they already have access to it.
At runtime, although the bytecode is already present in the application, the application will still dynamically load the code.
This ensures that the application keeps working as intended, even if the transformation we applied is incomplete.
Specifically, to call dynamically loaded code, an application needs to use reflection, and we saw in @sec:th-trans-ref that we need to keep reflection calls, and in order to keep reflection calls, we need the class loader created when loading bytecode.
To summarise, we do not modify the existing bytecode.
Instead, we add the intercepted bytecode to the application as additional #DEX files using the multi-dex format, as represented in @fig:th-inserting-dex.
=== Class Collisions <sec:th-class-collision>
We saw in @sec:cl/*-obfuscation*/ that having several classes with the same name in the same application can be problematic.
@ -177,10 +179,11 @@ The developer may have reused a helper class in both the dynamically loaded byte
When loaded dynamically, the classes are in a different class loader, and the class resolution is resolved at runtime, like we saw in @sec:cl-loading.
We decided to restrain our scope to the use of class loaders from the Android #SDK.
In the absence of class collision, those class loaders behave seamlessly and adding the classes to the application maintains the behaviour.
#jfl-note[Un example aiderait a comprendre \ jm: j'en ai pas qui prennent pas 3 pages de listing]
When we detect a collision, we rename one of the colliding classes in order to be able to differentiate between classes.
To avoid breaking the application, we then need to rename all references to this specific class and be careful not to modify references to the other class.
To do so, we regroup each class by the class loaders defining them.
To do so, we regroup each class by the class loaders that define them.
Then, for each colliding class name and each class loader, we check the actual class used by the class loader.
If the class has been renamed, we rename all references to this class in the classes defined by this class loader.
To find the class used by a class loader, we reproduce the behaviour of the different class loaders of the Android #SDK.

View file

@ -5,14 +5,14 @@
To perform the transformations described in @sec:th-trans, we need information like the name and signature of the method called with reflection, or the actual bytecode loaded dynamically.
We decided to collect that information through dynamic analysis.
We saw in @sec:bg different contributions that collect this kind of information.
In the end, we decided to keep the analysis as simple as possible, so we avoided using a custom Android build like DexHunter, and instead used Frida to instrument the application and intercept calls of the methods of interest.
@sec:th-fr-dcl present our approach to collect dynamically loaded bytecode, and @sec:th-fr-ref present our approach to collect the reflection data.
Because using dynamic analysis raises the concern of coverage, we also need some interaction with the application during the analysis.
In the end, we decided to keep the analysis as simple as possible, so we avoided using a custom Android build like DexHunter and instead used Frida to instrument the application and intercept calls to the methods of interest.
@sec:th-fr-dcl presents our approach to collect dynamically loaded bytecode, and @sec:th-fr-ref presents our approach to collect the reflection data.
Because using dynamic analysis raises the concern of coverage, we also need some interaction with the graphical user interface of the application during the analysis.
Ideally, a reverse engineer would do the interaction.
Because we wanted to analyse many applications in a reasonable time, we replaced this engineer with an automated runner that simulates the interactions.
We discuss this option in @sec:th-grod.
=== Collecting Bytecode Dynamically Loaded <sec:th-fr-dcl>
=== Collecting the Dynamically Loaded Bytecode <sec:th-fr-dcl>
Initially, we considered instrumenting the constructor methods of the class loaders of the Android #SDK.
However, this is a significant number of methods to instrument, and looking at older applications, we realised that we missed the `DexFile` class.
@ -23,7 +23,7 @@ We found that all those calls are from under either `DexFile.openInMemoryDexFile
As a reference, in 2015, DexHunter~@zhang2015dexhunter already noticed `DexFile.openDexFileNative(..)` (although in the end DexHunter intruments another function, `DefineClass(..)`).
`DefineClass(..)` is still a good function to instrument, but it is a C++ native method that does not have a Java interface, making it harder to work with using Frida, and we want to avoid patching the source code of the #ART like DexHunter did.
For this reason, we decided to hook `DexFile.openInMemoryDexFilesNative(..)` and `DexFile.openDexFileNative(..)` instead.
Those methods take as argument a list of Android code files, either in the form of in-memory byte arrays or file paths, and a reference to the classloader associated with the code.
Those methods take a list of Android code files as argument, either in the form of in-memory byte arrays or file paths, and a reference to the classloader associated with the code.
Instrumenting those methods allows us to collect all the code files loaded by the #ART and associate them with their class loaders.
=== Collecting Reflection Data <sec:th-fr-ref>
@ -39,10 +39,10 @@ This information is more difficult to collect than one would expect.
It is stored in the stack, but before the #SDK 34, the stack was not directly accessible programmatically.
Historically, when a reverse engineer needed to access the stack, they would trigger and catch an exception and get the stack from that exception.
The issue with this approach is that data stored in exceptions is meant for debugging.
In particullar, the location of the call in the bytecode has a different meaning depending on the debug information encoded in the bytecode.
In particular, the location of the call in the bytecode has a different meaning depending on the debug information encoded in the bytecode.
It can either be the address of the bytecode instruction invoking the callee method in the instruction array of the caller method, or the line number of the original source code that calls the callee method.
Fortunately, in the #SDK 34, Android introduced the `StackWalker` #API.
This #API allow to programatically travel the current stack and retrieve information from it, including the bytecode address of the instruction calling the callee methods.
This #API allows to programatically travel the current stack and retrieve information from it, including the bytecode address of the instruction calling the callee methods.
Considering that the line number is not a reliable information, we chose to use the new #API, despite the restrictions that come with choosing such a recent Android version (it was released in October 2023, around 2 years ago, and less than 50% of the current Android market share supports this #API today#footnote[https://gs.statcounter.com/android-version-market-share/mobile-tablet/worldwide/#monthly-202401-202508]).
=== Application Execution <sec:th-grod>
@ -50,9 +50,9 @@ Considering that the line number is not a reliable information, we chose to use
Dynamic analysis requires actually running the application.
In order to test multiple applications automatically, we needed to simulate human interactions with the applications.
In @sec:bg, we presented a few solutions to explore an application dynamically.
We first eliminated Sapienz, as it relies on an application instrumentation library called ELLA, which has not been updated for 9 years.
We first eliminated Sapienz~@mao_sapienz_2016, as it relies on an application instrumentation library called ELLA, which has not been updated for 9 years.
We also chose to avoid the Monkey because we noticed that it often triggers events that close the application (events like pressing the 'home' button, or opening the general settings drop-down menu at the top of the screen).
Stoat and GroddDroid use UI Automator to interact with the application.
Stoat~@su_guided_2017 and GroddDroid~@abraham_grodddroid_2015 use UI Automator to interact with the application.
UI Automator is a standard Android #API intended for automatic testing.
Both Soat and GroddDroid perform additional analysis on the application to improve the exploration.
In the end, we elected to use the most basic execution mode of GroddDroid that does not need this additional analysis.
@ -72,7 +72,7 @@ Then we run the application for five minutes with GroddRunner, and at the end of
If at some point an emulator stops responding for too long, we terminate it and restart it.
As we will see in @sec:th-dyn-failure, our experimental setup is quite naive and still requires improvement. #todo(strike(stroke: green)[Comment on dit proprement que c'est tout pété?])
For example, it does not implement any anti-evasion techniques, which can be a significant issue when analysing malware.
For example, we do not implement any anti-evasion techniques, which can be a significant issue when analysing malware.
Nonetheless, the benefit of our implementation is that it only requires an #ADB connection to a phone with a rooted Android system to work.
Of course, to analyse a specific application, a reverse engineer could use an actual smartphone and explore the application manually.
It would be a lot more stable than our automated batch analysis setup.

View file

@ -15,7 +15,8 @@ This represents #num(5000) applications over the #NBTOTALSTRING total of the ini
Among them, we could not retrieve 43 from Androzoo, leaving us with #num(dyn_res.all.nb) applications to test.
We will first look at the results of the dynamic analysis and look at the bytecode we intercepted.
Then, we will study the impact the instrumentation has on static analysis tools, notably on their success rate, and we will finish with the analysis of a handcrafted application to check whether the instrumentation does, in fact, improve the results of analysis tools.
Then, we will study the impact the instrumentation has on static analysis tools, notably on their success rate.
Additionally, we will study with the analysis of a handcrafted application to check whether the instrumentation does, in fact, improve the results of analysis tools.
=== Dynamic Analysis Results <sec:th-dyn-failure>
@ -32,7 +33,8 @@ We expected some issues related to the use of an emulator, like the lack of x86_
We manually looked at some applications, but did not find a notable pattern.
In some cases, the application was just broken -- for instance, an application was trying to load a native library that simply does not exist in the application.
In other cases, Frida is to blame: we found some cases where calling a method from Frida can confuse the #ART.
`protected` methods need to be called from the class that defined the method or one of its child classes, but Frida might be considered by the #ART as another class, leading to the #ART aborting the application.
`protected` methods cannot be called from a class other than the one that defined the method or one of its children.
The issue is that Frida might be considered by the #ART as another class, leading to the #ART aborting the application.
#todo[jfl was suppose to test a few other app #emoji.eyes]
@tab:th-dyn-visited shows the number of applications that we analysed, if we managed to start at least one activity and if we intercepted code loading or reflection.
It also shows the average number of activities visited (when at least one activity was started).
@ -53,7 +55,7 @@ As shown in the table, even if the application fails to start an activity, somet
table.cell(rowspan: 2)[nb apk],
table.vline(end: 3),
table.vline(start: 4),
table.cell(colspan: 2, inset: (bottom: 0pt))[nb failled],
table.cell(colspan: 2, inset: (bottom: 0pt))[nb failed],
table.vline(end: 3),
table.vline(start: 4),
table.cell(colspan: 2, inset: (bottom: 0pt))[activities visited],
@ -77,7 +79,7 @@ As shown in the table, even if the application fails to start an activity, somet
) <tab:th-dyn-visited>
The high number of applications that did not start an activity means that our results will be highly biased.
The code/method that might be loaded/called by reflection from inside activities is filtered out by the limit of or dynamic execution.
The code/method that might be loaded/called by reflection from inside activities is filtered out by the limit of our dynamic execution.
This bias must be kept in mind while reading the next subsection that studies the bytecode that we intercepted.
=== The Bytecode Loaded by Application <sec:th-code-collected>
@ -121,7 +123,7 @@ To estimate the scope of the code we made available, we use Androguard to genera
@tab:th-compare-cg shows the number of edges of those call graphs.
The columns before and after show the total number of edges of the graphs, and the diff column indicates the number of new edges detected (#ie the number of edges after instrumentation minus the number of edges before).
This number include edges from the bytecode loaded dynamically, as well as the call added to reflect reflection calls, and calls to "glue" methods (method like `Integer.intValue()` used to convert objects to scalar values, or calls to `T.check_is_Xxx_xxx(Method)` used to check if a `Method` object represent a known method).
The last column, "Added Reflection", is the list of non-glue method calls found in the call graph of the instrumented application but neither in call graph of the original #APK, nor in the call graphes of the added bytecode files that we computed separately.
The last column, "Added Reflection", is the list of non-glue method calls found in the call graph of the instrumented application but neither in the call graph of the original #APK, nor in the call graphs of the added bytecode files that we computed separately.
This corresponds to the calls we added to represent reflection calls.
The first application, #lower(compared_callgraph.at(0).sha256), is noticable.
@ -155,14 +157,14 @@ This is consistent with the behaviour of a packer: the application loads the mai
caption: [Edges added to the call graphs computed by Androguard by instrumenting the applications]
) <tab:th-compare-cg>
Unfortunately, our implementation of the transformation is imperfect and does fails sometime, as illustrated by #lower("5D2CD1D10ABE9B1E8D93C4C339A6B4E3D75895DE1FC49E248248B5F0B05EF1CE") in @tab:th-compare-cg.
Unfortunately, our implementation of the transformation is imperfect and sometimes fails, as illustrated by #lower("5D2CD1D10ABE9B1E8D93C4C339A6B4E3D75895DE1FC49E248248B5F0B05EF1CE") in @tab:th-compare-cg.
However, over the #num(dyn_res.all.nb - dyn_res.all.nb_failed) applications whose dynamic analysis finished in our experiment, #num(nb_patched) were patched.
The remaining #mypercent(dyn_res.all.nb - dyn_res.all.nb_failed - nb_patched, dyn_res.all.nb - dyn_res.all.nb_failed) failed either due to some quirk in the zip format of the #APK file, because of a bug in our implementation when exceeding the method reference limit in a single #DEX file, or in the case of #lower("5D2CD1D10ABE9B1E8D93C4C339A6B4E3D75895DE1FC49E248248B5F0B05EF1CE"), because the application reused the original application classloader to load new code instead of instanciated a new classes loader (a behavior we did not expected as not possible using only the #SDK, but enabled by hidden #APIs).
Taking into account the failure from both dynamic analysis and the instrumentation process, we have a #mypercent(dyn_res.all.nb - nb_patched, dyn_res.all.nb) failure rate.
This is a reasonable failure rate, but we should keep in mind that it adds up to the failure rate of the other tools we want to use on the patched application.
To check the impact on the finishing rate of our instrumentation, we then run the same experiment we ran in @sec:rasta.
We run the tools on the #APK before and after instrumentation, and compared the finishing rates in @fig:th-status-npatched-vs-patched (without taking into account #APKs we failed to patch#footnote[Due to a handling error during the experiment, the figure shows the results for #nb_patched_rasta #APKs instead of #nb_patched.]).
We run the tools on the #APK before and after instrumentation, and compared the finishing rates in @fig:th-status-npatched-vs-patched (without taking into account #APKs we failed to patch#footnote[Due to a handling error during the experiment, the figure shows the results for #nb_patched_rasta #APKs instead of #nb_patched. \ We also ignored the tool from Wognsen #etal due to the high number of timeouts]).
The finishing rate comparison is shown in @fig:th-status-npatched-vs-patched.
We can see that in most cases, the finishing rate is either the same or slightly lower for the instrumented application.
@ -181,16 +183,16 @@ On the other hand, Saaf do not detect the issue with Apktool and pursues the ana
width: 100%,
alt: "",
)
place(center + horizon, rotate(24deg, text(red.transparentize(0%), size: 20pt, "PRELIMINARY RESULTS")))
//place(center + horizon, rotate(24deg, text(red.transparentize(0%), size: 20pt, "PRELIMINARY RESULTS")))
},
caption: [Exist status of static analysis tools on original #APKs (left) and patched #APKs (right)]
caption: [Exit status of static analysis tools on original #APKs (left) and patched #APKs (right)]
) <fig:th-status-npatched-vs-patched>
#todo[Flowdroid results are inconclusive: some apks have more leak after and as many apks have less? also, runing flowdroid on the same apk can return a different number of leak???]
=== Example
In this subsection, we use our approach on a small #APK to look in more detail into the analysis of the transformed application.
In this subsection, we use our approach on a unique #APK to look in more detail into the analysis of the transformed application.
We handcrafted this application for the purpose of demonstrating how this can help a reverse engineer in their work.
Accordingly, this application is quite small and contains both dynamic code loading and reflection.
We defined a method `Utils.source()` and `Utils.sink()` to model a method that collects sensitive data and a method that exfiltrates data.
@ -228,10 +230,10 @@ public class Main {
A first analysis of the content of the application shows that the application contains one `Activity` that instantiates the class `Main` and calls `Main.main()`.
@lst:th-demo-before shows most of the code of `Main` as returned by Jadx.
We can see that the class contains another #DEX file encoded in base 64 and loaded in the `InMemoryDexClassLoader` `cl`.
A class is then loaded from this class loader, and two methods from this class loader are called.
The names of this class and methods are not directly accessible as they have been chipĥered and are decoded just before being used at runtime.
Here, the encryption key is available statically, and in theory, a very good static analyser implementing Android `Cipher` #API could compute the actual methods called.
We can see that the class contains another #DEX file encoded in base 64 and loaded in the `InMemoryDexClassLoader` `cl` (line 7).
A class is then loaded from this class loader (line 11), and two methods from this class loader are called (line 14).
The names of this class and methods are not directly accessible as they have been ciphered and are decoded just before being used at runtime.
Here, the encryption key is available statically (line 6), and in theory, a very good static analyser implementing Android `Cipher` #API could compute the actual methods called.
However, we could easily imagine an application that gets this key from a remote command and control server.
In this case, it would be impossible to compute those methods with static analysis alone.
When running Flowdroid on this application, it computed a call graph of 43 edges on this application, and no data leaks.
@ -240,7 +242,7 @@ This is not particularly surprising considering the obfuscation methods used.
Then we run the dynamic analysis we described in @sec:th-dyn on the application and apply the transformation described in @sec:th-trans to add the dynamic information to it.
This time, Flowdroid computes a larger call graph of 76 edges, and does find a data leak.
Indeed, when looking at the new application with Jadx, we notice a new class `Malicious`, and the code of `Main.main()` is now as shown in @lst:th-demo-after:
the method called in the loop is either `Malicious.get_data`, `Malicious.send_data()` or `Method.invoke()`.
the method called in the loop is either `Malicious.get_data`, `Malicious.send_data()` or `Method.invoke()` (lines 9, 11 and 12).
Although self-explanatory, verifying the code of those methods indeed confirms that `get_data()` calls `Utils.source()` and `send_data()` calls `Utils.sink()`.
#figure(
@ -297,6 +299,6 @@ In red on the figure however, we have the calls that were hidded by reflection i
#v(2em)
To conclude, we showed that our approach indeed improves the results of analysis tools without impacting their finishing rates too much.
To conclude, we showed that our approach indeed improves the results of analysis tools without impacting their finishing rates much.
Unfortunately, we also noticed that our dynamic analysis is suboptimal, either due to our experimental setup or due to our solution to explore the applications.
In the next section, we will present in more detail the limitations of our solution, as well as future work that can be done to improve the contributions presented in this chapter.

View file

@ -9,7 +9,9 @@ In this section, we will present those issues and potential avenues of improveme
=== Bytecode Transformation
#paragraph[Custom Class Loaders][
The first obvious limitation of our bytecode transformation is that we do not know what custom class loaders do, so we cannot accurately reproduce statically their behaviour.
The first obvious limitation of our bytecode transformation is that we do not know what custom class loaders (class loaders implemented by the application developer, as opposed to the class loaders in the #SDK) do, so we cannot accurately reproduce statically their behaviour.
For instance, we can imagine a class loader that loads all classes whose name starts with an `A` from one #DEX file, and all classes whose name starts with a `B` from another.
If both #DEX files have colliding classes, our implementation will not select the right classes.
We elected to fallback to the behaviour of the `BaseDexClassLoader`, which is the highest Android-specific class loader in the inheritance hierarchy, and whose behaviour is shared by all class loaders except `DelegateLastClassLoader`.
The current implementation of the #ART enforces some restrictions on the class loader's behaviour to optimise the runtime performance by caching classes.
This gives us some guarantees that custom class loaders will keep some coherence with the classic class loaders.
@ -42,7 +44,7 @@ This leads to potential invalid runtime behaviour, as the first method that matc
#paragraph[`ClassNotFoundException` may not be raised][
In the very specific situation where the original application tries to access a class from dynamically loaded bytecode without actually accessing this bytecode (#eg by using the wrong class loader), the patched application behaviour will differ.
The original application should raise a `ClassNotFoundException`, but in the patched application, the class will be accessible and the exception will not be raised.
The original application should raise a `ClassNotFoundException`, but in the patched application, the class will be accessible, and the exception will not be raised.
In practice, there is not a lot of reason to do such a thing.
One could be to check if the #APK has been tampered with, but there are easier ways to do this, like checking the application signature.
Another would be to check if the class is already available, and if not, load it dynamically, in which case it does not matter, as code loaded dynamically is already present.
@ -55,7 +57,7 @@ In any case, statically, because we remove neither the calls to the function tha
#paragraph[Anti Evasion][
Our dynamic analysis does not perform any kind of anti-evasive technique.
Any application implementing even basic evasion will detect our environment and will probably not load malicious bytecode.
Running the dynamic analysis in an appropriate sandbox such as DroidDungeon should improve the results significantly.
Running the dynamic analysis in an appropriate sandbox, such as DroidDungeon~@ruggia_unmasking_2024, should improve the results significantly.
]
#paragraph[Code Coverage][

View file

@ -6,7 +6,7 @@ In this chapter, we presented a set of transformations to apply to an applicatio
We also presented a dynamic analysis approach to collect the information needed to perform those transformations.
We then applied this method to a recent subset of applications of our dataset from @sec:rasta.
When comparing the success rate of the tools of @sec:rasta on the applications before and after the transformation, we found that, in general, the success rate of those tools slightly decreases, with a few exceptions.
When comparing the success rate of the tools of @sec:rasta on the applications before and after the transformation, we found that, in general, the success rate of those tools slightly decreases (except for a few tools).
We also showed that our transformation indeed allows static analysis tools to access and process that runtime information in their analysis.
However, a more in-depth look at the results of our dynamic analysis showed that our code coverage is lacking, and that the great majority of dynamically loaded code we intercepted is from generic advertisement and telemetry libraries.

View file

@ -7,9 +7,9 @@
#align(center, highlight-block(inset: 15pt, width: 75%, block(align(left)[
Some applications use dynamic code loading and reflection calls that prevent static analysis tools from analysing the complete application.
Those behaviours can be analysed with dynamic analysis; however, the information collected is not enough to analyse the application: most tools do not have a way to process this additional data.
This can be detected with dynamic analysis; however, the collected data is not enough to analyse the application further: most tools do not have a way to process this additional data.
In this chapter, we propose to use dynamic analysis to collect information related to dynamic code loading and reflection, and to encode this information in the bytecode of the application to allow further analysis.
We compared the results of analysis on applications before and after the transformation, using tools like Flowdroid or Androguard, and found that the additional information is indeed processed by the tools.
We compared the results of analysis on applications before and after the transformation, using tools like Flowdroid or Androguard, and found that the additional information is indeed processed by the static analysis tools.
We also compared the finishing rate of the tools, using the same experiment as in @sec:rasta, and found that the finishing rate is generally only slightly negatively impacted by the transformation.
])))