wip

update to last revision
2025-07-16 00:42:30 +02:00 · 2025-07-15 23:36:21 +02:00
6 changed files with 1203 additions and 14 deletions
--- a/4_class_loader/1_related_work.typ
+++ b/4_class_loader/1_related_work.typ
@ -1,4 +1,4 @@
-#import "../lib.typ": etal, paragraph
+#import "../lib.typ": etal, paragraph, DEX
 #import "X_var.typ": *
 == State of the art <sec:cl-soa>
@ -23,7 +23,14 @@ Dynamic hook mechanisms should be used to intercept the bytecode at load time.
 These techniques can be of some help for the reverser, but they require to instrument the source code of AOSP or the application itself. 
 The engineering cost is high and anti-debugging techniques can slow down the process. 
 Thus, a reverser always starts by studying statically an application using static analysis tools@Li2017, and will eventually go to dynamic analysis@Egele2012 if further costly extra analysis is needed (for example, if they spot the use of a custom class loader). 
-In the first phase of an analysis where the used methods are static, the reverser can have the feeling that what he sees in the bytecode is what is loaded at runtime. 
+Performing a static analysis of an application can be time consuming if the programmer uses obfuscation techniques such as native code, packing techniques, value encryption, or reflection. 
 Such techniques can partially hide the Java bytecode from a static analysis investigation as they modify it at runtime. 
 For example, packers exploits the class loading capability of Android to load new code. 
 They also combine the loading with code generation from ciphered assets or code modification from native code calls@liao2016automated to increase the difficulty of recovery of the code. 
 Because parts of the original code will be only available at runtime, deobfuscation approaches propose techniques that track #DEX structures when manipulated by the application@zhang2015dexhunter @xue2017adaptive @wong2018tackling. All those contributions are directly related to the class loading mechanism of Android.
 Deobfuscating an application is the first problem the reverse engineer has to solve. Nevertheless, even, if all classes of the code are recovered by the reverse engineer, understanding what are the classes that are really loaded by Android brings an additional problem. 
 The reverse engineer can have the feeling that what he sees in the bytecode is what is loaded at runtime, whereas the system can choose alternative implementations of a class.
 Our goal is to show that tools mentioned in the literature@Li2017 can suffer from attacks exploiting confusion inside regular class loading mechanisms of Android.
 ]
--- a/4_class_loader/2_classloading.typ
+++ b/4_class_loader/2_classloading.typ
@ -122,7 +122,7 @@ We discuss in the next section how to obtain these classes from the emulator.
@fig:cl-archisdk shows how classes of Android are used in the development environment and at runtime. 
 In the development environment, Android Studio uses `android.jar` and the specific classes written by the developer. 
-After compilation, only the classes of the developer, and eventually extra classes computed by Android Studio are zipped in the APK file, using the multi-dex format. 
+After compilation, only the classes of the developer, and sometimes extra classes computed by Android Studio are zipped in the APK file, using the multi-dex format. 
 At runtime, the application uses `BootClassLoader` to load the #platc from Android. 
 Until our work, previous works@he_systematic_2023 @li_accessing_2016 considered both #Asdk and #hidec  to be in the file `/system/framework/framework.jar` found in the phone itself, but we found that the classes loaded by `bootClassLoader` are not all present in `framework.jar`. 
 For example, He #etal @he_systematic_2023 counted 495 thousand APIs (fields and methods) in Android 12, based on Google documentation on restriction for non SDK interfaces#footnote[https://developer.android.com/guide/app-compatibility/restrictions-non-sdk-interfaces]. 
--- a/4_class_loader/3_obfuscation.typ
+++ b/4_class_loader/3_obfuscation.typ
@ -99,7 +99,7 @@ Again, the shadowing implementation discards the data.
 We found that these static analysis tools do not consider the class loading mechanism, either because the tools only look at the content of the application file (#eg a disassembler) or because they consider class loading to be a dynamic feature and thus out of their scope. 
 In @tab:cl-results, we report on the types of shadowing that can be tricked each tool. 
 A plain circle is a shadow attack that leads to a wrong result. 
-A white circle indicates a tool emitting warnings or that eventually displays the two versions of the class. 
+A white circle indicates a tool emitting warnings or that displays the two versions of the class. 
 A cross is a tool not impacted by a shadow attack.
 We explain in more detail in the following the results for each considered tool.
@ -223,6 +223,61 @@ Flowdroid gives priority to the classes from the SDK over the classes implemente
 Unfortunately, `android.jar` only contains classes from the #Asdk, meaning that using #hidec breaks the flow tracking. 
 Solving this issue would require finding the bytecode of all the platform classes of the Android version targeted and as we said previously it requires extracting this information from the emulator.
 === Countermeasures <sec:cl-countermeasures>
 Countermeasures against shadow attacks depend on each tool and its objectives.
 The first important recommendation is to implement the class selection algorithm according to the algorithm described in Listing @lst:cl-loading-alg.
 It should solve any case of self-shadowing, except for tools like Apktool, which do not have to select a class for computing the result but show the whole application's content.
 For those tools, a clear warning should be added, pointing out that multiple implementations have been found and displaying the one that will be used at runtime.
 Countermeasures against SDK shadow and Hidden shadow attacks are more complex to handle: it requires the list of platform classes on the target smartphone.
 The list of SDK classes can be extracted easily from android.jar, but hidden classes need to be obtained by another means.
 They could be listed directly from the AOSP tree of the Android source code, or obtained from Android documentation, or extracted from the phone itself.
 The first approach requires statically analyzing the source code, which can be difficult to achieve as several programming languages are used, and the code base is large andd fragmented.
 As discussed earlier in the paper, the documentation can lack some classes.
 Consequently, the most reliable source is the smartphone itself.
 It should be noted that none of these methods can be generalized for all possible versions of Android, as the exact list will depend on the exact targeted device, possibly modified by the manufacturer.
 Thus, to conter Shadow attaks, the static analysis tools that we evaluated need to embed multiple lists of platform classes, one for each Android version.
 Then, the best heuristic would be to use the list of platform classes that is closest to the target SDK of the analyzed application.
 Some tools like Flowdroid would require additional countermeasures: to compute the exact flow of data, Flowdroid also needs to analyze the code of platform classes.
 For the SDK classes, Flowdroid has already analyzed them, but the hidden classes have not.
 In addition to the data flow in hidden classes, Flowdroid needs a list of data sources and sinks from those classes.
 %Other analysis tools may require additional data from platform classes, which may be too difficult to obtain.
 We believe that analysis tools can handle shadow attacks to some degree.
 The implementation of the solution will differ depending on the nature tool and may not always require the same implementation effort.
 === Relation with obfuscation techniques <sec:cl-cross-obf>
 As described in the state of the art, reverse engineers face other techniques of obfuscation such as packers or native code.
 These techniques rely on custom class loaders that load new parts of the application from ciphered assets or from the network.
 The reverse engineers have to study the application dynamically, to recover new classes, and eventually go back to a static phase to understand the behavior of the application.
 In this section, we compare shadow attacks with these techniques and we discuss how they interact with them.
 Advanced obfuscation techniques relying on packers have a higher impact on the difficulty of performing a static analysis compared to shadow attacks.
 Most of the time, the reverse engineer cannot deobfuscate the application without performing a dynamic analysis.
 For this reasons, approaches have been designed to assist the capture of the bytecode that is loaded dynamically, after the precise time where the deobfuscation methods have been executed@zhang2015dexhunter @xue2017adaptive @wong2018tackling.
 On the contrary, a shadow attack can be easily defeated by implementing our algorithm in the static analysis tool, as discussed earlier in @sec:cl-countermeasures.
 Nevertheless, shadow attacks are stealthier than packers or native code. 
 Packers can be easily spotted by artifacts left behind in the application or by detecting classes implementing a custom class loading mechanism.
 On the contrary, an extra class implementing a shadow attack, that would not be executed, could contain voluntarily few code, compared to the executed class of Android.
 Such attack would be more discrete than a packer that adds in the application a lot of possibly native code
 Combining regular obfuscation techniques with shadow attacks can be achieved in two ways.
 First, the attacker could hide the code of a packer or a native call by using a shadow attack.
 For example, by colliding a class of the SDK, a control flow analysis could be wrongly computed, leading to consider that part of the code is dead, which would mislead the reverse engineer about the use of this part that contains a packer.
 At runtime, this code would be triggered, unpacking new code.
 Second, the attacker could use a packer to unpack code at runtime in a first phase.
 The reverse engineer would have to perform a dynamic analysis, for example uising a tool such as Dexhunter@zhang2015dexhunter, to recover new DEX files that are loaded by a custom class loader.
 Then, the reverse engineer would go back to a new static analysis and could have the problem of solving shadow attacks, for example, if a class is defined multiple times in the loaded DEX files.
 Because the interaction between shadow attacks and other obfuscations techniques often rely on a loading mechanism implemented by the developer, investigating these cases require to analyze the Java bytecode that is handling the loading. 
 This problem is left as future work.
 //\medskip
 We have seen that tools can be impacted by shadow attacks. In the next section, we will investigate if these attacks are used in the wild.
--- a/5_theseus/1_static_transformation.typ
+++ b/5_theseus/1_static_transformation.typ
@ -1,4 +1,4 @@
-#import "../lib.typ": todo, APK, DEX, JAR, OAT, eg
+#import "../lib.typ": todo, APK, DEX, JAR, OAT, eg, ART, paragraph
 /*
 * Parler de dex lego et du papier qui encode les resultats d'anger en jimple
@ -140,10 +140,17 @@ Because it is an internal, platform dependant format, we elected to ignore the #
 Practically, #JAR and #APK files are zip files containing #DEX files.
 This means that we only need to find a way to integrate #DEX files to the application.
-We elected to simply add the dex files to the application, using the multi-dex feature introduced by the SDK 21 now used by all applications.
+We elected to simply add the dex files to the application, using the multi-dex feature introduced by the SDK 21 now used by all applications as shown in @fig:th-inserting-dex.
 This gives access to the dynamically loaded code to static analysis tool. 
-#todo[add drawing of dex insertion]
+#figure(
  image(
    "figs/dex_insertion.svg",
    width: 80%,
    alt: "A diagram showing a box labelled 'app.apk', a box labelled 'lib.jar', and single file ouside the boxes labelled 'lib.dex'. The lib.jar boxe contains the files classes.dex and classes2.dex. Inside the app.apk box, the files AndroidManifest.xml, resources.arsc, classes.dex, classes2.dex, classes3.dex and the folders lib, res and assets are circled by dashes and labelled 'original files', and, still inside app.apk, the files classes4.dex, classes5.dex and classes5.dex are circled by dashes and labelled 'Added Files'. Arrows go from lib.dex to classes4.dex, from the classes.dex inside lib.jar to classes5.dex inside app.apk and from classe2.dex inside lib.jar to classes6.dex inside app.apk"
  ),
  caption: [Inserting #DEX files inside an #APK]
 ) <fig:th-inserting-dex>
 We decided to leave untouched the original code that load the bytecode.
 At runtime, although the bytecode is already present in the application, the application will still dynamically load the code.
@ -159,17 +166,76 @@ When loaded dynamically, the classes are in a different classloader, and the cla
 We decided to restrain our scope to the use of class loader from the Android SDK.
 In the abscence of class collision, those class loader behave seamlessly and adding the classes to application maintains the behavior.
-When we detect a collision, we rename one of the classes colliding before injecting it to the application.
+#todo[this is redundant an messy:]
 When we detect a collision, we rename one of the classes colliding in order to be able to differenciate both classes.
 To avoid breaking the application, we then need to rename all references to this specific class, an be carefull not to modify references to the other class.
 To do so, we regroup each classes by the classloaders defining them, then, for each colliding class name and each classloader, we check the actual class used by the classloader.
 If the class has been renamed, we rename all reference to this class in the classes defined by this classloader.
 To find the class used by a classloader, we reproduce the behavior of the different classloaders of the Android SDK.
 This is an important step: remember that the delegation process can lead to situation where the class defined by a classloader is not the class that will be loaded when querying the classloader.
 The pseudo-code in @lst:renaming-algo show the three steps of this algorithm: 
 - first we detect collision and rename classes definitions to remove the collisions
 - then we rename the reference to the colliding classes to make sure the right classes are called
 - ultimately, we merge the modified dexfiles of each class loaders into one android application
-#todo[renamin algo]
+#figure(
  ```python
  defined_classes = set()
  redifined_classes = set()
-=== Pitfalls
+  # Rename the definition of redifined classes
  for cl in class_loaders:
    for clz in defined_classes.intersection(cl.defined_classes):
      cl.rename_definition(clz)
      redifined_classes.add(clz)
    defined_classes.update(cl.defined_classes)
-#todo[interupting try blocks: catch block might expect temporary registers to still stored the saved value]
+  # Rename reference of redifined classes
-#todo[diferenciating the classloaders]
+  for cl in class_loaders:
-#todo[changing classloader with class collision]
+    for clz in redifined_classes:
      defining_cl = cl.resolve_class(clz).class_loader
      cl.rename_reference(clz, defining_cl.new_name(clz))
  # Merge the classloader into a flat APK
  new_apk = Apk()
  for cl in class_loaders:
    for dex in cl.get_dex():
      new_apk.add_dex(dex)
  ```,
  caption: [Pseudo-code of the renaming algorithm]
 ) <lst:renaming-algo>
 /*
 * Although we limited ourselves to replacing one specific bytecode instruction, we encontered many technical challenges
 * #todo[interupting try blocks: catch block might expect temporary registers to still stored the saved value] ?
 */
 === Limitations
 #paragraph()[Custom Classloaders][
 The first obvious limitation is that we do not know what custom classloaders do, so we cannot accuratly emulate their behavior.
 We elected to fallback to the behavior of the `BaseDexClassLoader`, which is the highest Android specific classloader in the inheritance hierarchy, and whose behavior is shared by all classloaders safe `DelegateLastClassLoader`.
 The current implementation of the #ART enforce some restrictions on the classloaders behavior to optimize the runtime performance by caching classes.
 This gives us some garanties that custom classesloaders will keep a some coherences will the classic classloaders.
 For instance, a class loaded dynamically must have the same name as the name used in `ClassLoader.loadClass()`.
 This make `BaseDexClassLoader` a good estimation for legitimate classloaders, however, an obfuscated application could use the techniques discussed in @sec:cl-cross-obf, in wich case our model would be entirelly wrong.
 ]
 #paragraph()[Multiple Classloaders for one `Method.invoke()`][
 #todo[explain the problem arrose each time a class is compared to another]
 Although we managed to handle call to different methods from one `Method.invoke()` site, we do not handle calling methods from different classloaders with colliding classes definition.
 The first reason is that it is quite challenging to compare classloaders statically.
 At runtime, each object has an unique identifier that can be used to compare them over the course of the same execution, but this identifier is reset each time the application starts.
 This means we cannot use this identifier in an `if` condition to differentiate the classloaders.
 Ideally, we would combine the hash of the loaded #DEX files, the classloader class and parent to make an unique, static identifier, but the #DEX files loaded by a classloader cannot be accessed at runtime without accessing the process memory at arbitrary locations.
 For some classloaders, the string representation returned by `Object.toString()` list the location of the loaded #DEX file on the file system.
 This is not the case for the commonly used `InMemoryClassLoader`.
 In addition, the #DEX files are often located in the application private folder, whose name is derived from the hash of the #APK itself.
 Because we modify the application, the path of the private folder also change, and so will the string representation of the classloaders.
 Checking the classloader of a classes can also have side-effect on classloaders that delegate to the main application classloader:
 because we inject the classes in the #APK, the classes of the classloader are now already in the main application classloader, which in most case will have priority on the other classloaders, and lead to the class beeing loaded by the application classloader instead of the original classloader.
 If we check for the classloader, we would need to considere such cases en rename each classes of each classloader before reinjecting them to the in the application.
 This would greatly increase the risk of breaking the application during its transformation.
 Instead, we elected to ignore the classloaders when selecting the method to invoque.
 This leads to potential invalid runtime behaviore, as the first method that matching the class name will be called, but the alternative methods from other classloader still appears in the new application, albeit in a block that might be flagged as dead-code by a sufficiently advenced static analyser. 
 ]
--- a/5_theseus/figs/dex_insertion.svg
+++ b/5_theseus/figs/dex_insertion.svg
--- a/bibliography.bib
+++ b/bibliography.bib
@ -916,4 +916,3 @@
 	pages = {423--426},
 	file = {IEEE Xplore Abstract Record:/home/histausse/Zotero/storage/QEQLZHMD/7129009.html:text/html;Kriz and Maly - 2015 - Provisioning of application modules to Android dev.pdf:/home/histausse/Zotero/storage/8GRUYQLQ/Kriz and Maly - 2015 - Provisioning of application modules to Android dev.pdf:application/pdf},
 }
Author	SHA1	Message	Date
Jean-Marie 'Histausse' Mineau	655bff8de2	wip All checks were successful / test_checkout (push) Successful in 50s Details	2025-07-16 00:42:30 +02:00
Jean-Marie 'Histausse' Mineau	c64bff722b	update to last revision	2025-07-15 23:36:21 +02:00