64 lines
7.1 KiB
Typst
64 lines
7.1 KiB
Typst
#import "../lib.typ": etal, ie, ART, DEX, APK, SDK
|
|
#import "X_var.typ": *
|
|
|
|
== Introduction
|
|
|
|
/*
|
|
When building an application with Android Studio, the source codes of applications are compiled to Java bytecode, which is then converted to Dalvik bytecode.
|
|
Dalvik bytecode is then put in a zip archive with other resources such as the application manifest, and the zip archive is then signed.
|
|
All this process is handled by Android Studio, behind the scene.
|
|
At runtime, the Dalvik bytecode is either interpreted by the Dalvik virtual machine or compiled by #ART in order to execute native code and it is up to these components to handle the loading of the classes.
|
|
Both behaviors are possible at the same time for a single application, and it is up to Android to choose which part of an application is compiled in native code.
|
|
*/
|
|
|
|
Android applications are distributed using markets of applications.
|
|
The market maintainers have the difficult task to discover suspicious applications and delete them if they are effectively malicious applications.
|
|
For such a task, some automated analysis is performed, but sometimes, a manual investigation is required.
|
|
A reverser is in charge of studying the application: they usually perform a static analysis and a dynamic analysis.
|
|
The reverser uses in the first phase static analysis tools in order to access and review the code of the application.
|
|
If this first phase is not accurately driven, for example if they fail to access a critical class, they may decide that a malicious application is safe.
|
|
Additionally, as stated by Li #etal~@Li2017 in their conclusions, such a task is complexified by dynamic code loading, reflective calls, native code, and multi-threading which cannot be easily handled statically.
|
|
Nevertheless, even if we do not consider these aspects, determining statically how the regular class loading system of Android is working is a difficult task.
|
|
|
|
Class loading occurs at runtime and is handled by the components of #ART, even when the application is partially or fully compiled ahead of time.
|
|
Nevertheless, at the development stage, Android Studio handles the resolution of the different classes that can be internal to the application.
|
|
When building, the code is linked to the standard library i.e. the code contained in `android.jar`.
|
|
In this article, we call these classes "Development #SDK classes".
|
|
`android.jar` is not added to the application because its classes will be available at runtime in others `.jar` files.
|
|
To distinguish those classes found at runtime from Dev #SDK classes, we call them #Asdkc.
|
|
When releasing the application, the building process of Android Studio can manage different versions of the #Asdk, reported in the Manifest as the "#SDK versions".
|
|
Indeed, some parts of the core #Asdkc can be embedded in the application, for retro compatibility purposes: by comparing the specified minimum #SDK version and the target #SDK version, the code of extra #Asdkc is stored in the APK file.
|
|
As a consequence, it is frequent to find inside applications some classes that come from the `com.android` packages.
|
|
At runtime each smartphone runs a unique version of Android, but, as the application is deployed on multiple versions of Android, it is difficult to predict which classes will be loaded from the #Asdkc or from the APK file itself.
|
|
This complexity increases with the multi-#DEX format of recent #APK files that can contain several bytecode files.
|
|
|
|
Going back to the problem of a reverser studying a suspicious application statically, the reverser uses tools to disassemble the application~@mauthe_large-scale_2021 and track the flows of data in the bytecode.
|
|
As an example, for a spyware potentially leaking personal information, the reverser can unpack the application with Apktool and, after manually locating a method that they suspect to read sensitive data (by reading the unpacked bytecode), they can compute with FlowDroid~@Arzt2014a if there is a flow from this method to methods performing HTTP requests.
|
|
During these steps, the reverser faces the problem of resolving statically, which class is loaded from the APK file and the #Asdkc.
|
|
If they, or the tools they use, choose the wrong version of the class, they may obtain wrong conclusions about the code.
|
|
Thus, the possibility of shadowing classes could be exploited by an attacker in order to obfuscate the code.
|
|
|
|
In this paper, we study how Android handles the loading of classes in the case of multiple versions of the same class.
|
|
Such collision can exist inside the APK file or between the APK file and #Asdkc.
|
|
We intend to understand if a reverser would be impacted during a static analysis when dealing with such an obfuscated code.
|
|
Because this problem is already enough complex with the current operations performed by Android, we exclude the case where a developer recodes a specific class loader or replace a class loader by another one, as it is often the case for example in packed applications~@Duan2018.
|
|
We present a new technique that "shadows" a class #ie embeds a class in the APK file and "presents" it to the reverser instead of the legitimate version.
|
|
The goal of such an attack is to confuse them during the reversing process: at runtime the real class will be loaded from another location of the APK file or from the #Asdk, instead of the shadow version.
|
|
This attack can be applied to regular classes of the #Asdk or to hidden classes of Android~@he_systematic_2023 @li_accessing_2016.
|
|
We show how these attacks can confuse the tools of the reverser when he performs a static analysis.
|
|
In order to evaluate if such attacks are already used in the wild, we analyzed #nbapk applications from 2023 that we extracted randomly from AndroZoo~@allixAndroZooCollectingMillions2016.
|
|
Our main result is that #shadowsdk of these applications contain shadow collisions against the #SDK and #shadowhidden against hidden classes.
|
|
Our investigations conclude that most of these collisions are not voluntary attacks, but we highlight one specific malware sample performing strong obfuscation revealed by our detection of one shadow attack.
|
|
|
|
The paper is structured as follows.
|
|
@sec:cl-soa reviews the state of the art about loading of Android classes and the tools to perform reverse engineering on applications.
|
|
Then, @sec:cl-loading investigates the internal mechanisms about class loading and presents how a reverser can be confused by these mechanisms.
|
|
In @sec:cl-obfuscation, we design obfuscation techniques and we show their effect on static analysis tools.
|
|
Finally, @sec:cl-wild evaluates if these obfuscation techniques are used in the wild, by searching inside #nbapk APKs if they exploit these techniques.
|
|
@sec:cl-ttv discusses the limits of this work and @sec:cl-conclusion concludes the paper.
|
|
|
|
// In addition to the public #Asdk of `android.jar`, other internal classes are also available for the Android Runtime.
|
|
// Those classes are called hidden #Asdkc@li_accessing_2016, and are not supposed to be used by applications.
|
|
// In reality their use is tolerated and many applications use them to access some of Android features.
|
|
// This tolerance is one of the key point that lead to confusion attacks that we describe later in the paper.
|
|
|