From c64bff722b386b1fc8e3bc9053dcfd170d178334 Mon Sep 17 00:00:00 2001 From: Jean-Marie 'Histausse' Mineau Date: Tue, 15 Jul 2025 23:36:21 +0200 Subject: [PATCH] update to last revision --- 4_class_loader/1_related_work.typ | 11 ++++-- 4_class_loader/2_classloading.typ | 2 +- 4_class_loader/3_obfuscation.typ | 57 ++++++++++++++++++++++++++++++- 3 files changed, 66 insertions(+), 4 deletions(-) diff --git a/4_class_loader/1_related_work.typ b/4_class_loader/1_related_work.typ index bb9709c..d973b03 100644 --- a/4_class_loader/1_related_work.typ +++ b/4_class_loader/1_related_work.typ @@ -1,4 +1,4 @@ -#import "../lib.typ": etal, paragraph +#import "../lib.typ": etal, paragraph, DEX #import "X_var.typ": * == State of the art @@ -23,7 +23,14 @@ Dynamic hook mechanisms should be used to intercept the bytecode at load time. These techniques can be of some help for the reverser, but they require to instrument the source code of AOSP or the application itself. The engineering cost is high and anti-debugging techniques can slow down the process. Thus, a reverser always starts by studying statically an application using static analysis tools@Li2017, and will eventually go to dynamic analysis@Egele2012 if further costly extra analysis is needed (for example, if they spot the use of a custom class loader). -In the first phase of an analysis where the used methods are static, the reverser can have the feeling that what he sees in the bytecode is what is loaded at runtime. +Performing a static analysis of an application can be time consuming if the programmer uses obfuscation techniques such as native code, packing techniques, value encryption, or reflection. +Such techniques can partially hide the Java bytecode from a static analysis investigation as they modify it at runtime. +For example, packers exploits the class loading capability of Android to load new code. +They also combine the loading with code generation from ciphered assets or code modification from native code calls@liao2016automated to increase the difficulty of recovery of the code. +Because parts of the original code will be only available at runtime, deobfuscation approaches propose techniques that track #DEX structures when manipulated by the application@zhang2015dexhunter @xue2017adaptive @wong2018tackling. All those contributions are directly related to the class loading mechanism of Android. + +Deobfuscating an application is the first problem the reverse engineer has to solve. Nevertheless, even, if all classes of the code are recovered by the reverse engineer, understanding what are the classes that are really loaded by Android brings an additional problem. +The reverse engineer can have the feeling that what he sees in the bytecode is what is loaded at runtime, whereas the system can choose alternative implementations of a class. Our goal is to show that tools mentioned in the literature@Li2017 can suffer from attacks exploiting confusion inside regular class loading mechanisms of Android. ] diff --git a/4_class_loader/2_classloading.typ b/4_class_loader/2_classloading.typ index f761024..77d2124 100644 --- a/4_class_loader/2_classloading.typ +++ b/4_class_loader/2_classloading.typ @@ -122,7 +122,7 @@ We discuss in the next section how to obtain these classes from the emulator. @fig:cl-archisdk shows how classes of Android are used in the development environment and at runtime. In the development environment, Android Studio uses `android.jar` and the specific classes written by the developer. -After compilation, only the classes of the developer, and eventually extra classes computed by Android Studio are zipped in the APK file, using the multi-dex format. +After compilation, only the classes of the developer, and sometimes extra classes computed by Android Studio are zipped in the APK file, using the multi-dex format. At runtime, the application uses `BootClassLoader` to load the #platc from Android. Until our work, previous works@he_systematic_2023 @li_accessing_2016 considered both #Asdk and #hidec to be in the file `/system/framework/framework.jar` found in the phone itself, but we found that the classes loaded by `bootClassLoader` are not all present in `framework.jar`. For example, He #etal @he_systematic_2023 counted 495 thousand APIs (fields and methods) in Android 12, based on Google documentation on restriction for non SDK interfaces#footnote[https://developer.android.com/guide/app-compatibility/restrictions-non-sdk-interfaces]. diff --git a/4_class_loader/3_obfuscation.typ b/4_class_loader/3_obfuscation.typ index c321bff..9c9ae45 100644 --- a/4_class_loader/3_obfuscation.typ +++ b/4_class_loader/3_obfuscation.typ @@ -99,7 +99,7 @@ Again, the shadowing implementation discards the data. We found that these static analysis tools do not consider the class loading mechanism, either because the tools only look at the content of the application file (#eg a disassembler) or because they consider class loading to be a dynamic feature and thus out of their scope. In @tab:cl-results, we report on the types of shadowing that can be tricked each tool. A plain circle is a shadow attack that leads to a wrong result. -A white circle indicates a tool emitting warnings or that eventually displays the two versions of the class. +A white circle indicates a tool emitting warnings or that displays the two versions of the class. A cross is a tool not impacted by a shadow attack. We explain in more detail in the following the results for each considered tool. @@ -223,6 +223,61 @@ Flowdroid gives priority to the classes from the SDK over the classes implemente Unfortunately, `android.jar` only contains classes from the #Asdk, meaning that using #hidec breaks the flow tracking. Solving this issue would require finding the bytecode of all the platform classes of the Android version targeted and as we said previously it requires extracting this information from the emulator. +=== Countermeasures + +Countermeasures against shadow attacks depend on each tool and its objectives. +The first important recommendation is to implement the class selection algorithm according to the algorithm described in Listing @lst:cl-loading-alg. +It should solve any case of self-shadowing, except for tools like Apktool, which do not have to select a class for computing the result but show the whole application's content. +For those tools, a clear warning should be added, pointing out that multiple implementations have been found and displaying the one that will be used at runtime. + +Countermeasures against SDK shadow and Hidden shadow attacks are more complex to handle: it requires the list of platform classes on the target smartphone. +The list of SDK classes can be extracted easily from android.jar, but hidden classes need to be obtained by another means. +They could be listed directly from the AOSP tree of the Android source code, or obtained from Android documentation, or extracted from the phone itself. +The first approach requires statically analyzing the source code, which can be difficult to achieve as several programming languages are used, and the code base is large andd fragmented. +As discussed earlier in the paper, the documentation can lack some classes. +Consequently, the most reliable source is the smartphone itself. +It should be noted that none of these methods can be generalized for all possible versions of Android, as the exact list will depend on the exact targeted device, possibly modified by the manufacturer. +Thus, to conter Shadow attaks, the static analysis tools that we evaluated need to embed multiple lists of platform classes, one for each Android version. +Then, the best heuristic would be to use the list of platform classes that is closest to the target SDK of the analyzed application. + +Some tools like Flowdroid would require additional countermeasures: to compute the exact flow of data, Flowdroid also needs to analyze the code of platform classes. +For the SDK classes, Flowdroid has already analyzed them, but the hidden classes have not. +In addition to the data flow in hidden classes, Flowdroid needs a list of data sources and sinks from those classes. +%Other analysis tools may require additional data from platform classes, which may be too difficult to obtain. + +We believe that analysis tools can handle shadow attacks to some degree. +The implementation of the solution will differ depending on the nature tool and may not always require the same implementation effort. + +=== Relation with obfuscation techniques + +As described in the state of the art, reverse engineers face other techniques of obfuscation such as packers or native code. +These techniques rely on custom class loaders that load new parts of the application from ciphered assets or from the network. +The reverse engineers have to study the application dynamically, to recover new classes, and eventually go back to a static phase to understand the behavior of the application. +In this section, we compare shadow attacks with these techniques and we discuss how they interact with them. + +Advanced obfuscation techniques relying on packers have a higher impact on the difficulty of performing a static analysis compared to shadow attacks. +Most of the time, the reverse engineer cannot deobfuscate the application without performing a dynamic analysis. +For this reasons, approaches have been designed to assist the capture of the bytecode that is loaded dynamically, after the precise time where the deobfuscation methods have been executed@zhang2015dexhunter @xue2017adaptive @wong2018tackling. +On the contrary, a shadow attack can be easily defeated by implementing our algorithm in the static analysis tool, as discussed earlier in @sec:cl-countermeasures. +Nevertheless, shadow attacks are stealthier than packers or native code. +Packers can be easily spotted by artifacts left behind in the application or by detecting classes implementing a custom class loading mechanism. +On the contrary, an extra class implementing a shadow attack, that would not be executed, could contain voluntarily few code, compared to the executed class of Android. +Such attack would be more discrete than a packer that adds in the application a lot of possibly native code + +Combining regular obfuscation techniques with shadow attacks can be achieved in two ways. + +First, the attacker could hide the code of a packer or a native call by using a shadow attack. +For example, by colliding a class of the SDK, a control flow analysis could be wrongly computed, leading to consider that part of the code is dead, which would mislead the reverse engineer about the use of this part that contains a packer. +At runtime, this code would be triggered, unpacking new code. + +Second, the attacker could use a packer to unpack code at runtime in a first phase. +The reverse engineer would have to perform a dynamic analysis, for example uising a tool such as Dexhunter@zhang2015dexhunter, to recover new DEX files that are loaded by a custom class loader. +Then, the reverse engineer would go back to a new static analysis and could have the problem of solving shadow attacks, for example, if a class is defined multiple times in the loaded DEX files. + +Because the interaction between shadow attacks and other obfuscations techniques often rely on a loading mechanism implemented by the developer, investigating these cases require to analyze the Java bytecode that is handling the loading. +This problem is left as future work. + + //\medskip We have seen that tools can be impacted by shadow attacks. In the next section, we will investigate if these attacks are used in the wild.