wip classloader paper
This commit is contained in:
parent
6d9096e314
commit
c5e119e877
13 changed files with 3138 additions and 8 deletions
67
4_class_loader/0_intro.typ
Normal file
67
4_class_loader/0_intro.typ
Normal file
|
@ -0,0 +1,67 @@
|
|||
#import "../lib.typ": etal, ie
|
||||
#import "X_var.typ": *
|
||||
|
||||
== Introduction
|
||||
|
||||
/*
|
||||
When building an application with Android Studio, the source codes of applications are compiled to Java bytecode, which is then converted to Dalvik bytecode.
|
||||
Dalvik bytecode is then put in a zip archive with other resources such as the application manifest, and the zip archive is then signed.
|
||||
All this process is handled by Android Studio, behind the scene.
|
||||
At runtime, the Dalvik bytecode is either interpreted by the Dalvik virtual machine or compiled by ART in order to execute native code and it is up to these components to handle the loading of the classes.
|
||||
Both behaviors are possible at the same time for a single application, and it is up to Android to choose which part of an application is compiled in native code.
|
||||
*/
|
||||
|
||||
Android applications are distributed using markets of applications.
|
||||
The market maintainers have the difficult task to discover suspicious applications and delete them if they are effectively malicious applications.
|
||||
For such a task, some automated analysis is performed, but sometimes, a manual investigation is required.
|
||||
A reverser is in charge of studying the application: they usually perform a static analysis and a dynamic analysis.
|
||||
The reverser uses in the first phase static analysis tools in order to access and review the code of the application.
|
||||
If this first phase is not accurately driven, for example if they fail to access a critical class, they may decide that a malicious application is safe.
|
||||
Additionally, as stated by Li #etal@Li2017 in their conclusions, such a task is complexified by dynamic code loading, reflective calls, native code, and multi-threading which cannot be easily handled statically.
|
||||
Nevertheless, even if we do not consider these aspects, determining statically how the regular class loading system of Android is working is a difficult task.
|
||||
|
||||
Class loading occurs at runtime and is handled by the components of Android Runtime (ART), even when the application is partially or fully compiled ahead of time.
|
||||
Nevertheless, at the development stage, Android Studio handles the resolution of the different classes that can be internal to the application.
|
||||
When building, the code is linked to the standard library i.e. the code contained in `android.jar`.
|
||||
In this article, we call these classes "Development SDK classes".
|
||||
`android.jar` is not added to the application because its classes will be available at runtime in others `.jar` files.
|
||||
To distinguish those classes found at runtime from Dev SDK classes, we call them #Asdkc.
|
||||
When releasing the application, the building process of Android Studio can manage different versions of the #Asdk, reported in the Manifest as the "SDK versions".
|
||||
Indeed, some parts of the core #Asdkc can be embedded in the application, for retro compatibility purposes: by comparing the specified minimum SDK version and the target SDK version, the code of extra #Asdkc is stored in the APK file.
|
||||
As a consequence, it is frequent to find inside applications some classes that come from the `com.android` packages.
|
||||
At runtime each smartphone runs a unique version of Android, but, as the application is deployed on multiple versions of Android, it is difficult to predict which classes will be loaded from the #Asdkc or from the APK file itself.
|
||||
This complexity increases with the multi-DEX format of recent APK files that can contain several bytecode files.
|
||||
|
||||
Going back to the problem of a reverser studying a suspicious application statically, the reverser uses tools to disassemble the application@mauthe_large-scale_2021 and track the flows of data in the bytecode.
|
||||
As an example, for a spyware potentially leaking personal information, the reverser can unpack the application with Apktool and, after manually locating a method that they suspect to read sensitive data (by reading the unpacked bytecode), they can compute with FlowDroid@Arzt2014a if there is a flow from this method to methods performing HTTP requests.
|
||||
During these steps, the reverser faces the problem of resolving statically, which class is loaded from the APK file and the #Asdkc.
|
||||
If they, or the tools they use, choose the wrong version of the class, they may obtain wrong conclusions about the code.
|
||||
Thus, the possibility of shadowing classes could be exploited by an attacker in order to obfuscate the code.
|
||||
|
||||
In this paper, we study how Android handles the loading of classes in the case of multiple versions of the same class.
|
||||
Such collision can exist inside the APK file or between the APK file and #Asdkc.
|
||||
We intend to understand if a reverser would be impacted during a static analysis when dealing with such an obfuscated code.
|
||||
Because this problem is already enough complex with the current operations performed by Android, we exclude the case where a developer recodes a specific class loader or replace a class loader by another one, as it is often the case for example in packed applications@Duan2018.
|
||||
We present a new technique that "shadows" a class #ie embeds a class in the APK file and "presents" it to the reverser instead of the legitimate version.
|
||||
The goal of such an attack is to confuse them during the reversing process: at runtime the real class will be loaded from another location of the APK file or from the #Asdk, instead of the shadow version.
|
||||
This attack can be applied to regular classes of the #Asdk or to hidden classes of Android@he_systematic_2023 @li_accessing_2016.
|
||||
We show how these attacks can confuse the tools of the reverser when he performs a static analysis.
|
||||
In order to evaluate if such attacks are already used in the wild, we analyzed #nbapk applications from 2023 that we extracted randomly from AndroZoo@allixAndroZooCollectingMillions2016.
|
||||
Our main result is that #shadowsdk of these applications contain shadow collisions against the SDK and #shadowhidden against hidden classes.
|
||||
Our investigations conclude that most of these collisions are not voluntary attacks, but we highlight one specific malware sample performing strong obfuscation revealed by our detection of one shadow attack.
|
||||
|
||||
The paper is structured as follows.
|
||||
@sec:cl-soa reviews the state of the art about loading of Android classes and the tools to perform reverse engineering on applications.
|
||||
Then, @sec:cl-loading investigates the internal mechanisms about class loading and presents how a reverser can be confused by these mechanisms.
|
||||
In @sec:cl-obfuscation, we design obfuscation techniques and we show their effect on static analysis tools.
|
||||
Finally, @sec:cl-wild evaluates if these obfuscation techniques are used in the wild, by searching inside #nbapk APKs if they exploit these techniques.
|
||||
@sec:cl-ttv discusses the limits of this work and @sec:cl-conclusion concludes the paper.
|
||||
|
||||
// In addition to the public #Asdk of `android.jar`, other internal classes are also available for the Android Runtime.
|
||||
// Those classes are called hidden #Asdkc@li_accessing_2016, and are not supposed to be used by applications.
|
||||
// In reality their use is tolerated and many applications use them to access some of Android features.
|
||||
// This tolerance is one of the key point that lead to confusion attacks that we describe later in the paper.
|
||||
|
||||
== TODO <sec:cl-wild>
|
||||
== TODO <sec:cl-ttv>
|
||||
== TODO <sec:cl-conclusion>
|
Loading…
Add table
Add a link
Reference in a new issue