This commit is contained in:
Jean-Marie 'Histausse' Mineau 2025-07-29 16:23:42 +02:00
parent 243b9df134
commit c060e88996
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
17 changed files with 264 additions and 96 deletions

View file

@ -17,7 +17,7 @@ For such a task, some automated analysis is performed, but sometimes, a manual i
A reverser is in charge of studying the application: they usually perform a static analysis and a dynamic analysis.
The reverser uses in the first phase static analysis tools in order to access and review the code of the application.
If this first phase is not accurately driven, for example if they fail to access a critical class, they may decide that a malicious application is safe.
Additionally, as stated by Li #etal@Li2017 in their conclusions, such a task is complexified by dynamic code loading, reflective calls, native code, and multi-threading which cannot be easily handled statically.
Additionally, as stated by Li #etal~@Li2017 in their conclusions, such a task is complexified by dynamic code loading, reflective calls, native code, and multi-threading which cannot be easily handled statically.
Nevertheless, even if we do not consider these aspects, determining statically how the regular class loading system of Android is working is a difficult task.
Class loading occurs at runtime and is handled by the components of #ART, even when the application is partially or fully compiled ahead of time.
@ -32,8 +32,8 @@ As a consequence, it is frequent to find inside applications some classes that c
At runtime each smartphone runs a unique version of Android, but, as the application is deployed on multiple versions of Android, it is difficult to predict which classes will be loaded from the #Asdkc or from the APK file itself.
This complexity increases with the multi-#DEX format of recent #APK files that can contain several bytecode files.
Going back to the problem of a reverser studying a suspicious application statically, the reverser uses tools to disassemble the application@mauthe_large-scale_2021 and track the flows of data in the bytecode.
As an example, for a spyware potentially leaking personal information, the reverser can unpack the application with Apktool and, after manually locating a method that they suspect to read sensitive data (by reading the unpacked bytecode), they can compute with FlowDroid@Arzt2014a if there is a flow from this method to methods performing HTTP requests.
Going back to the problem of a reverser studying a suspicious application statically, the reverser uses tools to disassemble the application~@mauthe_large-scale_2021 and track the flows of data in the bytecode.
As an example, for a spyware potentially leaking personal information, the reverser can unpack the application with Apktool and, after manually locating a method that they suspect to read sensitive data (by reading the unpacked bytecode), they can compute with FlowDroid~@Arzt2014a if there is a flow from this method to methods performing HTTP requests.
During these steps, the reverser faces the problem of resolving statically, which class is loaded from the APK file and the #Asdkc.
If they, or the tools they use, choose the wrong version of the class, they may obtain wrong conclusions about the code.
Thus, the possibility of shadowing classes could be exploited by an attacker in order to obfuscate the code.
@ -41,12 +41,12 @@ Thus, the possibility of shadowing classes could be exploited by an attacker in
In this paper, we study how Android handles the loading of classes in the case of multiple versions of the same class.
Such collision can exist inside the APK file or between the APK file and #Asdkc.
We intend to understand if a reverser would be impacted during a static analysis when dealing with such an obfuscated code.
Because this problem is already enough complex with the current operations performed by Android, we exclude the case where a developer recodes a specific class loader or replace a class loader by another one, as it is often the case for example in packed applications@Duan2018.
Because this problem is already enough complex with the current operations performed by Android, we exclude the case where a developer recodes a specific class loader or replace a class loader by another one, as it is often the case for example in packed applications~@Duan2018.
We present a new technique that "shadows" a class #ie embeds a class in the APK file and "presents" it to the reverser instead of the legitimate version.
The goal of such an attack is to confuse them during the reversing process: at runtime the real class will be loaded from another location of the APK file or from the #Asdk, instead of the shadow version.
This attack can be applied to regular classes of the #Asdk or to hidden classes of Android@he_systematic_2023 @li_accessing_2016.
This attack can be applied to regular classes of the #Asdk or to hidden classes of Android~@he_systematic_2023 @li_accessing_2016.
We show how these attacks can confuse the tools of the reverser when he performs a static analysis.
In order to evaluate if such attacks are already used in the wild, we analyzed #nbapk applications from 2023 that we extracted randomly from AndroZoo@allixAndroZooCollectingMillions2016.
In order to evaluate if such attacks are already used in the wild, we analyzed #nbapk applications from 2023 that we extracted randomly from AndroZoo~@allixAndroZooCollectingMillions2016.
Our main result is that #shadowsdk of these applications contain shadow collisions against the #SDK and #shadowhidden against hidden classes.
Our investigations conclude that most of these collisions are not voluntary attacks, but we highlight one specific malware sample performing strong obfuscation revealed by our detection of one shadow attack.

View file

@ -5,38 +5,38 @@
#paragraph([Class loading])[
Class loading mechanisms have been studied in the general context of the Java language.
Gong@gong_secure_1998 describes the JDK 1.2 class loading architecture and capabilities.
Gong~@gong_secure_1998 describes the JDK 1.2 class loading architecture and capabilities.
One of the main advantages of class loading is the type safety property that prevents type spoofing.
As explained by Liang and Bracha@liang_dynamic_1998, by capturing events at runtime (new loaders, new class) and maintaining constraints on the multiple loaders and their delegation hierarchy, authors can avoid confusion when loading a spoofed class.
As explained by Liang and Bracha~@liang_dynamic_1998, by capturing events at runtime (new loaders, new class) and maintaining constraints on the multiple loaders and their delegation hierarchy, authors can avoid confusion when loading a spoofed class.
This behavior is now implemented in modern Java virtual machines.
Later Tazawa and Hagiya@tozawa_formalization_2002 proposed a formalization of the Java Virtual Machine supporting dynamic class loading in order to ensure type safety.
Later Tazawa and Hagiya~@tozawa_formalization_2002 proposed a formalization of the Java Virtual Machine supporting dynamic class loading in order to ensure type safety.
Those works ensure strong safety for the Java Virtual Machine, in particular when linking new classes at runtime.
Although Android has a similar mechanism, the implementation is not shared with the JVM of Oracle.
Additionally, in this paper, we do not focus on spoofing classes at runtime, but on confusion that occurs when using a static analyzer used by a reverser that tries to understand the code loading process offline.
Contributions about Android class loading focus on using the capabilities of class loading to extend Android features or to prevent reverse engineering of Android applications.
For instance, Zhou #etal@zhou_dynamic_2022 extend the class loading mechanism of Android to support regular Java bytecode and Kritz and Maly@kriz_provisioning_2015 propose a new class loader to automatically load modules of an application without user interactions.
For instance, Zhou #etal~@zhou_dynamic_2022 extend the class loading mechanism of Android to support regular Java bytecode and Kritz and Maly~@kriz_provisioning_2015 propose a new class loader to automatically load modules of an application without user interactions.
Regarding reverse engineering, class loading mechanisms are frequently used by packers for hiding all or parts of the code of an application@Duan2018.
Regarding reverse engineering, class loading mechanisms are frequently used by packers for hiding all or parts of the code of an application~@Duan2018.
The problem to be solved consists in locating secondary #dexfiles that can be unciphered just before being loaded.
Dynamic hook mechanisms should be used to intercept the bytecode at load time.
These techniques can be of some help for the reverser, but they require to instrument the source code of AOSP or the application itself.
The engineering cost is high and anti-debugging techniques can slow down the process.
Thus, a reverser always starts by studying statically an application using static analysis tools@Li2017, and will eventually go to dynamic analysis@Egele2012 if further costly extra analysis is needed (for example, if they spot the use of a custom class loader).
Thus, a reverser always starts by studying statically an application using static analysis tools~@Li2017, and will eventually go to dynamic analysis~@Egele2012 if further costly extra analysis is needed (for example, if they spot the use of a custom class loader).
Performing a static analysis of an application can be time consuming if the programmer uses obfuscation techniques such as native code, packing techniques, value encryption, or reflection.
Such techniques can partially hide the Java bytecode from a static analysis investigation as they modify it at runtime.
For example, packers exploits the class loading capability of Android to load new code.
They also combine the loading with code generation from ciphered assets or code modification from native code calls@liao2016automated to increase the difficulty of recovery of the code.
Because parts of the original code will be only available at runtime, deobfuscation approaches propose techniques that track #DEX structures when manipulated by the application@zhang2015dexhunter @xue2017adaptive @wong2018tackling. All those contributions are directly related to the class loading mechanism of Android.
They also combine the loading with code generation from ciphered assets or code modification from native code calls~@liao2016automated to increase the difficulty of recovery of the code.
Because parts of the original code will be only available at runtime, deobfuscation approaches propose techniques that track #DEX structures when manipulated by the application~@zhang2015dexhunter @xue2017adaptive @wong2018tackling. All those contributions are directly related to the class loading mechanism of Android.
Deobfuscating an application is the first problem the reverse engineer has to solve. Nevertheless, even, if all classes of the code are recovered by the reverse engineer, understanding what are the classes that are really loaded by Android brings an additional problem.
The reverse engineer can have the feeling that what he sees in the bytecode is what is loaded at runtime, whereas the system can choose alternative implementations of a class.
Our goal is to show that tools mentioned in the literature@Li2017 can suffer from attacks exploiting confusion inside regular class loading mechanisms of Android.
Our goal is to show that tools mentioned in the literature~@Li2017 can suffer from attacks exploiting confusion inside regular class loading mechanisms of Android.
]
#paragraph([Hidden APIs])[
Li #etal did an empirical study of the usage and evolution of hidden APIs@li_accessing_2016.
Li #etal did an empirical study of the usage and evolution of hidden APIs~@li_accessing_2016.
They found that hidden APIs are added and removed in every release of Android, and that they are used both by benign and malicious applications.
More recently, He #etal @he_systematic_2023 did a systematic study of hidden service API related to security.
More recently, He #etal~@he_systematic_2023 did a systematic study of hidden service API related to security.
They studied how the hidden API can be used to bypass Android security restrictions and found that although Google countermeasures are effective, they need to be implemented inside the system services and not the hidden API due to the lack of in-app privilege isolation: the framework code is in the same process as the user code, meaning any restriction in the framework can be bypassed by the user.
]

View file

@ -51,7 +51,7 @@ When used directly by ART, the classes are usually stored in an application file
The order in which classes are loaded at runtime requires special attention.
All the specific Android class loaders (`DexClassLoader`, `InMemoryClassLoader`, etc.) have the same behavior (except `DelegateLastClassLoader`) but they handle specificities for the input format.
Each class loader has a delegate class loader, represented in the right part of @fig:cl-class_loading_classes by black plain arrows for an instance of `PathClassLoader` and an instance of `DelegateLastClassLoader` (the other class loaders also have this delegate).
Each class loader has a delegate class loader, represented in the right part of @fig:cl-class_loading_classes by black plain arrows for an instance of `PathClassLoader` and an instance of `DelegateLastClassLoader` (the other class loaders also have this delegate).
This delegate is a concept specific to class loaders and has nothing to do with class inheritance.
By default, class loaders will delegate to the singleton class `BootClassLoader`, except if a specific class loader is provided when instantiating the new class loader.
When a class loader needs to load a class, except for `DelegateLastClassLoader`, it will first ask the delegate, i.e. `BootClassLoader`, and if the delegate does not find the class, the class loader will try to load the class on its own.
@ -102,7 +102,7 @@ With such a hypothesis, the delegation process can be modeled by the pseudo-code
In addition, it is important to distinguish the two types of #platc handled by `BootClassLoader` and that both have priority over classes from the application at runtime:
- the ones available in the *#Asdk* (normally visible in the documentation);
- the ones that are internal and that should not be used by the developer. We call them *#hidec*@he_systematic_2023 @li_accessing_2016 (not documented).
- the ones that are internal and that should not be used by the developer. We call them *#hidec*~@he_systematic_2023 @li_accessing_2016 (not documented).
As a preliminary conclusion, we observe that a priority exists in the class loading mechanism and that an attacker could use it to prioritize an implementation over another one.
This could mislead the reverser if they use the one that has the lowest priority.
@ -124,8 +124,8 @@ We discuss in the next section how to obtain these classes from the emulator.
In the development environment, Android Studio uses `android.jar` and the specific classes written by the developer.
After compilation, only the classes of the developer, and sometimes extra classes computed by Android Studio are zipped in the APK file, using the multi-dex format.
At runtime, the application uses `BootClassLoader` to load the #platc from Android.
Until our work, previous works@he_systematic_2023 @li_accessing_2016 considered both #Asdk and #hidec to be in the file `/system/framework/framework.jar` found in the phone itself, but we found that the classes loaded by `bootClassLoader` are not all present in `framework.jar`.
For example, He #etal @he_systematic_2023 counted 495 thousand APIs (fields and methods) in Android 12, based on Google documentation on restriction for non SDK interfaces#footnote[https://developer.android.com/guide/app-compatibility/restrictions-non-sdk-interfaces].
Until our work, previous works~@he_systematic_2023 @li_accessing_2016 considered both #Asdk and #hidec to be in the file `/system/framework/framework.jar` found in the phone itself, but we found that the classes loaded by `bootClassLoader` are not all present in `framework.jar`.
For example, He #etal~@he_systematic_2023 counted 495 thousand APIs (fields and methods) in Android 12, based on Google documentation on restriction for non SDK interfaces#footnote[https://developer.android.com/guide/app-compatibility/restrictions-non-sdk-interfaces].
However, when looking at the content of `framework.jar`, we only found #num(333) thousand APIs.
Indeed, classes such as `com.android.okhttp.OkHttpClient` are loaded by `bootClassLoader`, listed by Google, but not in `framework.jar`.

View file

@ -8,7 +8,7 @@ Then, in order to evaluate their efficiency, we reviewed some common Android rev
We call this collision "*class shadowing*", because the attacker version of the class shadows the one that will be used at runtime.
To evaluate if such shadow attacks are working, we handcrafted three applications implementing shadowing techniques to test their impact on static analysis tools.
Then, we manually inspected the output of the tools in order to check its consistency with what Android is really doing at runtime.
For example, for Apktool, we look at the output disassembled code, and for Flowdroid@Arzt2014a, we check that a flow between `Taint.source()` and `Taint.sink()` is correctly computed.
For example, for Apktool, we look at the output disassembled code, and for Flowdroid~@Arzt2014a, we check that a flow between `Taint.source()` and `Taint.sink()` is correctly computed.
/*
@ -78,7 +78,7 @@ Such shadow attacks are more difficult to detect by a reverser, that may not kno
)<lst:cl-testapp>
We selected tools that are commonly used to unpack and reverse Android applications: Jadx#footnote[https://github.com/skylot/jadx], a decompiler for Android applications, Apktool#footnote[https://apktool.org/], a disassembler/repackager of applications, Androguard#footnote[https://github.com/androguard/androguard], one of the oldest Python package for manipulating Android applications, and Flowdroid@Arzt2014a that performs taint flow analysis.
We selected tools that are commonly used to unpack and reverse Android applications: Jadx#footnote[https://github.com/skylot/jadx], a decompiler for Android applications, Apktool#footnote[https://apktool.org/], a disassembler/repackager of applications, Androguard#footnote[https://github.com/androguard/androguard], one of the oldest Python package for manipulating Android applications, and Flowdroid~@Arzt2014a that performs taint flow analysis.
For evaluating the tools, we designed a single application that we can customize for different tests.
@lst:cl-testapp shows the main body implementing:
@ -204,11 +204,11 @@ Because of that, like for Apktool and Jadx, Androguard has no way to warn the re
==== Flowdroid
Flowdroid@Arzt2014a is used to detect if an application can leak sensitive information.
Flowdroid~@Arzt2014a is used to detect if an application can leak sensitive information.
To do so, the analyst provides a list of source and sink methods.
The return value of a method marked as source is considered sensitive and the argument of a method marked as sink is considered to be leaked.
By analyzing the bytecode of an application, Flowdroid can detect if data emitted by source methods can be exfiltrated by a sink method.
Flowdroid is built on top of the Soot@Arzt2013 framework that handles, among other things, the class selection process.
Flowdroid is built on top of the Soot~@Arzt2013 framework that handles, among other things, the class selection process.
We found that when selecting the classes implementation in a multi-dex APK, Soot uses an algorithm close to what ART is performing:
Soot sorts the `.dex` bytecode file with a specified `prioritizer` (a comparison function that defines an order for #dexfiles) and selects the first implementation found when iterating over the sorted files.
@ -257,7 +257,7 @@ In this section, we compare shadow attacks with these techniques and we discuss
Advanced obfuscation techniques relying on packers have a higher impact on the difficulty of performing a static analysis compared to shadow attacks.
Most of the time, the reverse engineer cannot deobfuscate the application without performing a dynamic analysis.
For this reasons, approaches have been designed to assist the capture of the bytecode that is loaded dynamically, after the precise time where the deobfuscation methods have been executed@zhang2015dexhunter @xue2017adaptive @wong2018tackling.
For this reasons, approaches have been designed to assist the capture of the bytecode that is loaded dynamically, after the precise time where the deobfuscation methods have been executed~@zhang2015dexhunter @xue2017adaptive @wong2018tackling.
On the contrary, a shadow attack can be easily defeated by implementing our algorithm in the static analysis tool, as discussed earlier in @sec:cl-countermeasures.
Nevertheless, shadow attacks are stealthier than packers or native code.
Packers can be easily spotted by artifacts left behind in the application or by detecting classes implementing a custom class loading mechanism.
@ -271,7 +271,7 @@ For example, by colliding a class of the SDK, a control flow analysis could be w
At runtime, this code would be triggered, unpacking new code.
Second, the attacker could use a packer to unpack code at runtime in a first phase.
The reverse engineer would have to perform a dynamic analysis, for example uising a tool such as Dexhunter@zhang2015dexhunter, to recover new DEX files that are loaded by a custom class loader.
The reverse engineer would have to perform a dynamic analysis, for example uising a tool such as Dexhunter~@zhang2015dexhunter, to recover new DEX files that are loaded by a custom class loader.
Then, the reverse engineer would go back to a new static analysis and could have the problem of solving shadow attacks, for example, if a class is defined multiple times in the loaded DEX files.
Because the interaction between shadow attacks and other obfuscations techniques often rely on a loading mechanism implemented by the developer, investigating these cases require to analyze the Java bytecode that is handling the loading.

View file

@ -5,7 +5,7 @@
In this section, we evaluate in the wild if applications that can be found in the Play store or other markets use one of the shadow techniques.
Our goal is to explore the usage of shadow techniques in real applications.
Because we want to include malicious applications (in case such techniques would be used to hide malicious code), we selected #num(50000) applications randomly from AndroZoo@allixAndroZooCollectingMillions2016 that appeared in 2023.
Because we want to include malicious applications (in case such techniques would be used to hide malicious code), we selected #num(50000) applications randomly from AndroZoo~@allixAndroZooCollectingMillions2016 that appeared in 2023.
Malicious applications are spot in our dataset by using a threshold of 3 over the number of antivirus reporting an application as a malware.
Some few applications over the total cannot be retrieved or parsed leading to a final dataset of #nbapk applications.
We automatically disassembled the applications to obtain the list of included classes.