From c3a27f87119a9dac1d22ce55598ae9fd97628ade Mon Sep 17 00:00:00 2001 From: Jean-Marie 'Histausse' Mineau Date: Mon, 29 Sep 2025 20:15:36 +0200 Subject: [PATCH] normalization and move french to the end --- 3_rasta/4_failures_analysis.typ | 12 ++++----- 3_rasta/6_recommendations.typ | 4 +-- 4_class_loader/2_classloading.typ | 2 +- 5_theseus/3_static_transformation.typ | 22 +++++++-------- 5_theseus/4_dynamic_data_collection.typ | 4 +-- 5_theseus/6_limits.typ | 30 ++++++++++----------- main.typ | 36 ++++++++++++++----------- 7 files changed, 57 insertions(+), 53 deletions(-) diff --git a/3_rasta/4_failures_analysis.typ b/3_rasta/4_failures_analysis.typ index c1c82f3..b45ff1b 100644 --- a/3_rasta/4_failures_analysis.typ +++ b/3_rasta/4_failures_analysis.typ @@ -111,12 +111,12 @@ android.jar en version 9 qui génère des erreurs During the running of our experiment, we parsed the standard output and error to capture: -- Java errors and stack traces -- Python errors and stack traces -- Ruby errors and stack traces -- Log4j messages with a "ERROR" or "FATAL" level -- XSB error messages -- Ocaml errors +- Java errors and stack traces. +- Python errors and stack traces. +- Ruby errors and stack traces. +- Log4j messages with a "ERROR" or "FATAL" level. +- XSB error messages. +- Ocaml errors. For example, Dialdroid reports an average of #num(55.9) errors for one successful analysis. On the contrary, some tools, such as Blueseal report very few errors at a time, making it easier to identify the cause of the failure. diff --git a/3_rasta/6_recommendations.typ b/3_rasta/6_recommendations.typ index a6b884f..192c5e8 100644 --- a/3_rasta/6_recommendations.typ +++ b/3_rasta/6_recommendations.typ @@ -13,8 +13,8 @@ During the packaging and testing of the tools we examined in our experiment, the To make a tool easy to reuse, it should have documentation with at least: - Instructions about how to install the dependencies. - Instructions about how to build the tool (if the tool needs to be built). -- Instructions about how to use the tool (#eg command line arguments) -- Instructions about how to interpret the results of the tools (we only checked for the existence of the results in our experiment, but we found that some results can be quite obscure) +- Instructions about how to use the tool (#eg command line arguments). +- Instructions about how to interpret the results of the tools (we only checked for the existence of the results in our experiment, but we found that some results can be quite obscure). In addition to the documentation, a minimum working example with the expected result of the tools allows a potential user to check if everything is working as intended. This #MWE have the additional benefit that it can serve as an example in the documentation. diff --git a/4_class_loader/2_classloading.typ b/4_class_loader/2_classloading.typ index 80e7c34..cc9be1d 100644 --- a/4_class_loader/2_classloading.typ +++ b/4_class_loader/2_classloading.typ @@ -114,7 +114,7 @@ With such a hypothesis, the delegation process can be modelled by the pseudo-cod In addition, it is important to distinguish the two types of #platc handled by `BootClassLoader` and that both have priority over classes from the application at runtime: -- the ones available in the *#Asdk* (normally visible in the documentation); +- the ones available in the *#Asdk* (normally visible in the documentation). - the ones that are internal and that should not be used by the developer. We call them *#hidec*~@he_systematic_2023 @li_accessing_2016 (not documented). As a preliminary conclusion, we observe that a priority exists in the class loading mechanism and that an attacker could use it to prioritise an implementation over another one. diff --git a/5_theseus/3_static_transformation.typ b/5_theseus/3_static_transformation.typ index 48a0731..fab189a 100644 --- a/5_theseus/3_static_transformation.typ +++ b/5_theseus/3_static_transformation.typ @@ -45,7 +45,7 @@ When instantiating an object with `Object obj = cst.newInstance("Hello Void")`, ) One of the main reasons to use reflection is to access classes that are not present in the application bytecode, nor are platform classes. -Indeed, the application will crash if the #ART encounters references to a class that cannot be found by the current classloader. +Indeed, the application will crash if the #ART encounters references to a class that cannot be found by the current class loader. This is often the case when dealing with classes from bytecode loaded dynamically. To allow static analysis tools to analyse an application that uses reflection, we want to replace the reflection call with the bytecode that actually calls the method. @@ -149,9 +149,9 @@ This means that we only need to find a way to integrate #DEX files into the appl We saw in @sec:cl the class loading model of Android. When doing dynamic code loading, an application defines a new `ClassLoader` that handles the new bytecode, and starts accessing its classes using reflection. -We also saw in @sec:cl that Android now use the multi-dex format, allowing it to handle any number of #DEX files in one classloader. +We also saw in @sec:cl that Android now use the multi-dex format, allowing it to handle any number of #DEX files in one class loader. Therefore, the simpler way to give access to the dynamically loaded code to static analysis tools is to add the dex files to the application. -This should not impact the classloading model as long as there is no class collision (we will explore this in @sec:th-class-collision) and as long as the original application did not try to access inaccessible classes (we will develop this issue in @sec:th-limits). +This should not impact the class loading model as long as there is no class collision (we will explore this in @sec:th-class-collision) and as long as the original application did not try to access inaccessible classes (we will develop this issue in @sec:th-limits). #figure( image( @@ -166,7 +166,7 @@ In the end, we decided to *not* modify the original code that loads the bytecode We already added the bytecode loaded dynamically, and most tools already ignore dynamic code loading. At runtime, although the bytecode is already present in the application, the application will still dynamically load the code. This ensures that the application keeps working as intended, even if the transformation we applied is incomplete. -Specifically, to call dynamically loaded code, an application needs to use reflection, and we saw in @sec:th-trans-ref that we need to keep reflection calls, and in order to keep reflection calls, we need the classloader created when loading bytecode. +Specifically, to call dynamically loaded code, an application needs to use reflection, and we saw in @sec:th-trans-ref that we need to keep reflection calls, and in order to keep reflection calls, we need the class loader created when loading bytecode. === Class Collisions @@ -174,17 +174,17 @@ We saw in @sec:cl/*-obfuscation*/ that having several classes with the same name In @sec:th-trans-cl, we are adding new code. By doing so, we increase the probability of having class collisions: The developer may have reused a helper class in both the dynamically loaded bytecode and the application, or an obfuscation process may have renamed classes without checking for intersection between the two sources of bytecode. -When loaded dynamically, the classes are in a different classloader, and the class resolution is resolved at runtime, like we saw in @sec:cl-loading. +When loaded dynamically, the classes are in a different class loader, and the class resolution is resolved at runtime, like we saw in @sec:cl-loading. We decided to restrain our scope to the use of class loaders from the Android #SDK. In the absence of class collision, those class loaders behave seamlessly and adding the classes to the application maintains the behaviour. When we detect a collision, we rename one of the colliding classes in order to be able to differentiate between classes. To avoid breaking the application, we then need to rename all references to this specific class and be careful not to modify references to the other class. -To do so, we regroup each class by the classloaders defining them. -Then, for each colliding class name and each classloader, we check the actual class used by the classloader. -If the class has been renamed, we rename all references to this class in the classes defined by this classloader. -To find the class used by a classloader, we reproduce the behaviour of the different classloaders of the Android #SDK. -This is an important step: remember that the delegation process can lead to situations where the class defined by a classloader is not the class that will be loaded when querying the classloader. +To do so, we regroup each class by the class loaders defining them. +Then, for each colliding class name and each class loader, we check the actual class used by the class loader. +If the class has been renamed, we rename all references to this class in the classes defined by this class loader. +To find the class used by a class loader, we reproduce the behaviour of the different class loaders of the Android #SDK. +This is an important step: remember that the delegation process can lead to situations where the class defined by a class loader is not the class that will be loaded when querying the class loader. The pseudo-code in @lst:renaming-algo shows the three steps of this algorithm: - First, we detect collisions and rename class definitions to remove the collisions. - Then we rename the reference to the colliding classes to make sure the right classes are called. @@ -208,7 +208,7 @@ The pseudo-code in @lst:renaming-algo shows the three steps of this algorithm: defining_cl = cl.resolve_class(clz).class_loader cl.rename_reference(clz, defining_cl.new_name(clz)) - # Merge the classloader into a flat APK + # Merge the class loader into a flat APK new_apk = Apk() for cl in class_loaders: for dex in cl.get_dex(): diff --git a/5_theseus/4_dynamic_data_collection.typ b/5_theseus/4_dynamic_data_collection.typ index 6c057f9..dfa7140 100644 --- a/5_theseus/4_dynamic_data_collection.typ +++ b/5_theseus/4_dynamic_data_collection.typ @@ -14,7 +14,7 @@ We discuss this option in @sec:th-grod. === Collecting Bytecode Dynamically Loaded -Initially, we considered instrumenting the constructor methods of the classloaders of the Android #SDK. +Initially, we considered instrumenting the constructor methods of the class loaders of the Android #SDK. However, this is a significant number of methods to instrument, and looking at older applications, we realised that we missed the `DexFile` class. `DexFile` is now deprecated but still usable class that can be used to load bytecode dynamically. We initially missed this class because it is neither a `ClassLoader` class nor an #SDK class (anymore). @@ -24,7 +24,7 @@ As a reference, in 2015, DexHunter~@zhang2015dexhunter already noticed `DexFile. `DefineClass(..)` is still a good function to instrument, but it is a C++ native method that does not have a Java interface, making it harder to work with using Frida, and we want to avoid patching the source code of the #ART like DexHunter did. For this reason, we decided to hook `DexFile.openInMemoryDexFilesNative(..)` and `DexFile.openDexFileNative(..)` instead. Those methods take as argument a list of Android code files, either in the form of in-memory byte arrays or file paths, and a reference to the classloader associated with the code. -Instrumenting those methods allows us to collect all the code files loaded by the #ART and associate them with their classloaders. +Instrumenting those methods allows us to collect all the code files loaded by the #ART and associate them with their class loaders. === Collecting Reflection Data diff --git a/5_theseus/6_limits.typ b/5_theseus/6_limits.typ index 71c4319..b5717ec 100644 --- a/5_theseus/6_limits.typ +++ b/5_theseus/6_limits.typ @@ -8,11 +8,11 @@ In this section, we will present those issues and potential avenues of improveme === Bytecode Transformation -#paragraph[Custom Classloaders][ -The first obvious limitation of our bytecode transformation is that we do not know what custom classloaders do, so we cannot accurately reproduce statically their behaviour. -We elected to fallback to the behaviour of the `BaseDexClassLoader`, which is the highest Android-specific classloader in the inheritance hierarchy, and whose behaviour is shared by all classloaders except `DelegateLastClassLoader`. +#paragraph[Custom Class Loaders][ +The first obvious limitation of our bytecode transformation is that we do not know what custom class loaders do, so we cannot accurately reproduce statically their behaviour. +We elected to fallback to the behaviour of the `BaseDexClassLoader`, which is the highest Android-specific class loader in the inheritance hierarchy, and whose behaviour is shared by all class loaders except `DelegateLastClassLoader`. The current implementation of the #ART enforces some restrictions on the class loader's behaviour to optimise the runtime performance by caching classes. -This gives us some guarantees that custom class loaders will keep some coherence with the classic classloaders. +This gives us some guarantees that custom class loaders will keep some coherence with the classic class loaders. For instance, a class loaded dynamically must have the same name as the name used in `ClassLoader.loadClass()`. This makes `BaseDexClassLoader` a good approximation for legitimate class loaders. However, an obfuscated application could use the techniques discussed in @sec:cl-cross-obf, in which case our model would be entirely wrong. @@ -22,21 +22,21 @@ A more reasonable approach would be to improve the static analysis to intercept This would allow us to collect a mapping $("class loader", "class name") -> "class"$ that can then be used when renaming colliding classes. ] -#paragraph[Multiple Classloaders for one `Method.invoke()`][ -Although we managed to handle calls to different methods from one `Method.invoke()` site, we do not handle calling methods from different classloaders with colliding class definitions. -The first reason is that it is quite challenging to compare classloaders statically. +#paragraph[Multiple Class Loaders for one `Method.invoke()`][ +Although we managed to handle calls to different methods from one `Method.invoke()` site, we do not handle calling methods from different class loaders with colliding class definitions. +The first reason is that it is quite challenging to compare class loaders statically. At runtime, each object has a unique identifier that can be used to compare them over the course of the same execution, but this identifier is reset each time the application starts. -This means we cannot use this identifier in an `if` condition to differentiate the classloaders. -Ideally, we would combine the hash of the loaded #DEX files, the classloader class and parent to make a unique, static identifier, but the #DEX files loaded by a classloader cannot be accessed at runtime without accessing the process memory at arbitrary locations. -For some classloaders, the string representation returned by `Object.toString()` lists the location of the loaded #DEX file on the file system. +This means we cannot use this identifier in an `if` condition to differentiate the class loaders. +Ideally, we would combine the hash of the loaded #DEX files, the class loader class and parent to make a unique, static identifier, but the #DEX files loaded by a class loader cannot be accessed at runtime without accessing the process memory at arbitrary locations. +For some class loaders, the string representation returned by `Object.toString()` lists the location of the loaded #DEX file on the file system. This is not the case for the commonly used `InMemoryClassLoader`. In addition, the #DEX files are often located in the application's private folder, whose name is derived from the hash of the #APK itself. -Because we modify the application, the path of the private folder also changes, and so will the string representation of the classloaders. -Checking the classloader of a class can also have side effects on classloaders that delegate to the main application classloader: -because we inject the classes in the #APK, the classes of the classloader are now already in the main application classloader, which in most cases will have priority over the other classloaders, and lead to the class being loaded by the application classloader instead of the original classloader. -If we check for the classloader, we would need to consider such cases and rename each class of each classloader before reinjecting them in the application. +Because we modify the application, the path of the private folder also changes, and so will the string representation of the class loaders. +Checking the class loader of a class can also have side effects on class loaders that delegate to the main application class loader: +because we inject the classes in the #APK, the classes of the class loader are now already in the main application class loader, which in most cases will have priority over the other class loaders, and lead to the class being loaded by the application class loader instead of the original class loader. +If we check for the class loader, we would need to consider such cases and rename each class of each class loader before reinjecting them in the application. This would greatly increase the risk of breaking the application during its transformation. -Instead, we elected to ignore the classloaders when selecting the method to invoke. +Instead, we elected to ignore the class loaders when selecting the method to invoke. This leads to potential invalid runtime behaviour, as the first method that matches the class name will be called, but the alternative methods from other class loaders still appear in the new application, albeit in a block that might be flagged as dead code by a sufficiently advanced static analyser. ] diff --git a/main.typ b/main.typ index 2bd512f..9dea217 100644 --- a/main.typ +++ b/main.typ @@ -95,19 +95,6 @@ include("0_preamble/acknowledgements.typ") - // https://ed-matisse.doctorat-bretagne.fr/fr/soutenance-de-these#p-151 - // > Le manuscrit est normalement rédigé en français (Loi relative à l'emploi de la langue française, 1994). - // > Toutefois, il est accepté de bâtir le manuscrit sur la base d'un résumé substantiel en français - // > (au moins 4 pages), le reste du manuscrit étant considéré comme des annexes et étant alors rédigé en - // > langue étrangère. - // > - // > Dans le cas d'une thèse qui ne serait pas rédigée en français, il est conseillé de bien distinguer le - // > résumé substantiel des chapitres de la thèse pour éviter d'essuyer un refus de la part de - // > l'administration de l'établissement d'inscription (par exemple en l'intitulant résumé en français et - // > en ne lui affectant aucun numéro de chapitre). - // - include("0_preamble/french_summary.typ") - outline(title: "Table of Contents", indent: auto) show outline.entry: it => { v(5mm, weak: true) @@ -137,9 +124,6 @@ // Keep interline in table #show table: set par(leading: 0.65em) if paper_draft -#todo[Normalize classloaders vs class loaders] -#todo[Normalize bullets/item: either end with a '.' or a ';'] - #include("1_introduction/main.typ") #include("2_background/main.typ") #include("3_rasta/main.typ") @@ -148,3 +132,23 @@ #include("6_conclusion/main.typ") #bibliography("bibliography.bib") + +#{ + set heading(numbering: none, outlined: false) + set figure(outlined: false) + set page(numbering: "i") + counter(page).update(0) + + // https://ed-matisse.doctorat-bretagne.fr/fr/soutenance-de-these#p-151 + // > Le manuscrit est normalement rédigé en français (Loi relative à l'emploi de la langue française, 1994). + // > Toutefois, il est accepté de bâtir le manuscrit sur la base d'un résumé substantiel en français + // > (au moins 4 pages), le reste du manuscrit étant considéré comme des annexes et étant alors rédigé en + // > langue étrangère. + // > + // > Dans le cas d'une thèse qui ne serait pas rédigée en français, il est conseillé de bien distinguer le + // > résumé substantiel des chapitres de la thèse pour éviter d'essuyer un refus de la part de + // > l'administration de l'établissement d'inscription (par exemple en l'intitulant résumé en français et + // > en ne lui affectant aucun numéro de chapitre). + // + include("0_preamble/french_summary.typ") +}