wip classloader paper
This commit is contained in:
parent
6d9096e314
commit
c5e119e877
13 changed files with 3138 additions and 8 deletions
205
4_class_loader/2_classloading.typ
Normal file
205
4_class_loader/2_classloading.typ
Normal file
|
@ -0,0 +1,205 @@
|
|||
#import "../lib.typ": todo, ie, etal, num
|
||||
#import "X_var.typ": *
|
||||
|
||||
== Analyzing the class loading process <sec:cl-loading>
|
||||
|
||||
For building obfuscation techniques based on the confusion of tools with class loaders, we manually studied the code of Android that handles class loading.
|
||||
In this section, we report the inner workings of ART and we focus on the specificities of class loading that can bring confusion.
|
||||
Because the class loading implementation has evolved over time during the multiple iterations of the Android operating system, we mainly describe the behavior of ART from Android version 14 (SDK 34).
|
||||
|
||||
=== Class loaders
|
||||
|
||||
When ART needs to access a class, it queries a `ClassLoader` to retrieve its implementation.
|
||||
Each class has a reference to the `ClassLoader` that loaded it, and this class loader is the one that will be used to load supplementary classes used by the original class.
|
||||
For example in @lst:cl-expl-cl-loading, when calling `A.f()`, the ART will load `B` with the class loader that was used to load `A`.
|
||||
|
||||
#figure(
|
||||
```java
|
||||
class A {
|
||||
public static void f() {
|
||||
B b = new B();
|
||||
b.do_something();
|
||||
}}
|
||||
```,
|
||||
caption: [Class instantiation],
|
||||
) <lst:cl-expl-cl-loading>
|
||||
|
||||
This behavior has been inherited from Java and most of the core classes regarding class loaders have been kept in Android.
|
||||
Nevertheless, the Android implementation has slight differences and new class loaders have been added.
|
||||
For example, the java class loader `URLClassLoader` is still present in Android, but contrary to the official documentation, most of its methods have been removed or replaced by a stub that just raises an exception.
|
||||
Moreover, rather than using the Java class loaders `SecureClassLoader` or `URLClassLoader`, Android has several new class loaders that inherit from `ClassLoader` and override the appropriate methods.
|
||||
|
||||
The left part of @fig:cl-class_loading_classes shows the different class loaders specific to Android in white and the stubs of the original Java class loaders in grey.
|
||||
The main difference between the original Java class loaders and the ones used by Android is that they do not support the Java bytecode format.
|
||||
Instead, the Android-specific class loaders load their classes from (many) different file formats specific to Android.
|
||||
Usually, when used by a programmer, the classes are loaded from memory or from a file using the DEX format (`.dex`).
|
||||
When used directly by ART, the classes are usually stored in an application file (`.apk`) or in an optimized format (`OAR/ODEX`).
|
||||
|
||||
#todo[Alt text for cl-class_loading_classes]
|
||||
#figure([
|
||||
#image(
|
||||
"figs/classloaders-crop.svg",
|
||||
width: 80%,
|
||||
alt: ""
|
||||
)
|
||||
gray -- Java-based, white -- Android-based
|
||||
],
|
||||
caption: [The class loading hierarchy of Android]
|
||||
) <fig:cl-class_loading_classes>
|
||||
|
||||
=== Delegation <sec:cl-delegation>
|
||||
|
||||
The order in which classes are loaded at runtime requires special attention.
|
||||
All the specific Android class loaders (`DexClassLoader`, `InMemoryClassLoader`, etc.) have the same behavior (except `DelegateLastClassLoader`) but they handle specificities for the input format.
|
||||
Each class loader has a delegate class loader, represented in the right part of @fig:cl-class_loading_classes by black plain arrows for an instance of `PathClassLoader` and an instance of `DelegateLastClassLoader` (the other class loaders also have this delegate).
|
||||
This delegate is a concept specific to class loaders and has nothing to do with class inheritance.
|
||||
By default, class loaders will delegate to the singleton class `BootClassLoader`, except if a specific class loader is provided when instantiating the new class loader.
|
||||
When a class loader needs to load a class, except for `DelegateLastClassLoader`, it will first ask the delegate, i.e. `BootClassLoader`, and if the delegate does not find the class, the class loader will try to load the class on its own.
|
||||
This behavior implements a priority and avoids redefining by error a core class of the system, for example redefining `java.lang.String` that would be loaded by a child class loader instead of its delegates.
|
||||
`DelegateLastClassLoader` behaves slightly differently: it will first delegate to `BootClassLoader` then, it will check its files and finally, it will delegate to its actual delegate (given when instantiating the `DelegateLastClassLoader`).
|
||||
This behavior is useful for overriding specific classes of a class loader while keeping the other classes.
|
||||
A normal class loader would prioritize the classes of its delegate over its own.
|
||||
|
||||
#figure(
|
||||
```python
|
||||
def get_mutli_dex_classses_dex_name(index: int):
|
||||
if index == 0:
|
||||
return "classes.dex"
|
||||
else:
|
||||
return f"classes{index+1}.dex"
|
||||
|
||||
def load_class(class_name: str):
|
||||
if is_platform_class(class_name):
|
||||
return load_from_boot_class_loader(class_name)
|
||||
else:
|
||||
index = 0
|
||||
dex_file = get_mutli_dex_classses_dex_name(index)
|
||||
while file_exists_in_apk(dex_file) and \
|
||||
not class_found_in_dex_file(class_name, dex_file):
|
||||
index += 1
|
||||
if file_exists_in_apk(dex_file):
|
||||
return load_from_file(dex_file, class_name)
|
||||
else:
|
||||
raise ClassNotFoundError()
|
||||
```,
|
||||
caption: [Default Class Loading Algorithm for Android Applications],
|
||||
) <lst:cl-loading-alg>
|
||||
|
||||
At runtime, Android instantiates for each application three instances of class loaders described previously: `bootClassLoader`, the unique instance of `BootClassLoader`, and two instances of `PathClassLoader`: `systemClassLoader` and `appClassLoader`.
|
||||
`bootClassLoader` is responsible for loading Android *#platc*.
|
||||
It is the direct delegate of the two other class loaders instantiated by Android.
|
||||
`appClassLoader` points to the application `.apk` file, and is used to load the classes inside the application
|
||||
`systemClassLoader` is a `PathClassLoader` pointing to `'.'`, the working directory of the application, which is `'/'` by default.
|
||||
The documentation of `ClassLoader.getSystemClassLoader` reports that this class loader is the default context class loader for the main application thread.
|
||||
In reality, the #platc are loaded by `bootClassLoader` and the classes from the application are loaded from `appClassLoader`.
|
||||
`systemClassLoader` is never used.
|
||||
|
||||
In addition to the class loaders instantiated by ART when starting an application, the developer of an application can use class loaders explicitly by calling to ones from the #Asdk, or by recoding custom class loaders that inherit from the `ClassLoader` class.
|
||||
At this point, modeling accurately the complete class loading algorithm becomes impossible: the developer can program any algorithm of their choice.
|
||||
For this reason, this case is excluded from this paper and we focus on the default behavior where the context class loader is the one pointing to the `.apk` file and where its delegate is `BootClassLoader`.
|
||||
With such a hypothesis, the delegation process can be modeled by the pseudo-code of method `load_class` given in <lst:cl-listing3>.
|
||||
|
||||
In addition, it is important to distinguish the two types of #platc handled by `BootClassLoader` and that both have priority over classes from the application at runtime:
|
||||
|
||||
- the ones available in the *#Asdk* (normally visible in the documentation);
|
||||
- the ones that are internal and that should not be used by the developer. We call them *#hidec*@he_systematic_2023 @li_accessing_2016 (not documented).
|
||||
|
||||
As a preliminary conclusion, we observe that a priority exists in the class loading mechanism and that an attacker could use it to prioritize an implementation over another one.
|
||||
This could mislead the reverser if they use the one that has the lowest priority.
|
||||
To determine if a class is impacted by the priority given to `BootClassLoader`, we need to obtain the list of classes that are part of Android #ie the #platc.
|
||||
We discuss in the next section how to obtain these classes from the emulator.
|
||||
|
||||
=== Determining #platc
|
||||
|
||||
#figure(
|
||||
image(
|
||||
"figs/architecture_SDK-crop.svg",
|
||||
width: 80%,
|
||||
alt: ""
|
||||
),
|
||||
caption: [Location of SDK classes during development and at runtime]
|
||||
) <fig:cl-archisdk>
|
||||
|
||||
@fig:cl-archisdk shows how classes of Android are used in the development environment and at runtime.
|
||||
In the development environment, Android Studio uses `android.jar` and the specific classes written by the developer.
|
||||
After compilation, only the classes of the developer, and eventually extra classes computed by Android Studio are zipped in the APK file, using the multi-dex format.
|
||||
At runtime, the application uses `BootClassLoader` to load the #platc from Android.
|
||||
Until our work, previous works@he_systematic_2023 @li_accessing_2016 considered both #Asdk and #hidec to be in the file `/system/framework/framework.jar` found in the phone itself, but we found that the classes loaded by `bootClassLoader` are not all present in `framework.jar`.
|
||||
For example, He #etal @he_systematic_2023 counted 495 thousand APIs (fields and methods) in Android 12, based on Google documentation on restriction for non SDK interfaces#footnote[https://developer.android.com/guide/app-compatibility/restrictions-non-sdk-interfaces].
|
||||
However, when looking at the content of `framework.jar`, we only found #num(333) thousand APIs.
|
||||
Indeed, classes such as `com.android.okhttp.OkHttpClient` are loaded by `bootClassLoader`, listed by Google, but not in `framework.jar`.
|
||||
|
||||
For optimization purposes, classes are now loaded from `boot.art`.
|
||||
This file is used to speed up the start-up time of applications: it stores a dump of the C++ objects representing the *#platc* (#Asdk and #hidec) so that they do not need to be generated each time an application starts.
|
||||
Unfortunately, this format is not documented and not retro-compatible between Android versions and is thus difficult to parse.
|
||||
An easier solution to investigate the #platc is to look at the `BOOTCLASSPATH` environment variable in an emulator.
|
||||
This variable is used to load the classes without the `boot.art` optimization.
|
||||
We found 25 `.jar` files, including `framework.jar`, in the `BOOTCLASSPATH` of the standard emulator for Android 12 (SDK 32), 31 for Android 13 (SDK 33), and 35 for Android 14 (SDK 35), containing respectively a total of #num(499837), #num(539236) and #num(605098) API methods and fields.
|
||||
@tab:cl-platform_apis) summarizes the discrepancies we found between Google's list and the #platc we found in Android emulators.
|
||||
Note also that some methods may also be found _only_ in the documentation.
|
||||
Our manual investigations suggest that the documentation is not well synchronized with the evolution of the #platc and that Google has almost solved this issue in API 34.
|
||||
|
||||
|
||||
#figure({
|
||||
show table: set text(size: 0.80em)
|
||||
table(
|
||||
columns: 5,
|
||||
//inset: (x: 0% + 5pt, y: 0% + 2pt),
|
||||
stroke: none,
|
||||
align: center+horizon,
|
||||
table.hline(),
|
||||
table.header(
|
||||
table.cell(colspan: 5, inset: 3pt)[],
|
||||
table.cell(rowspan: 2)[*SDK version*],
|
||||
table.vline(end: 3),
|
||||
table.vline(start: 4),
|
||||
table.cell(colspan: 4)[*Number of API methods*],
|
||||
[Documented], [In emulator], [Only documented], [Only in emulator],
|
||||
),
|
||||
table.cell(colspan: 5, inset: 3pt)[],
|
||||
table.hline(),
|
||||
table.cell(colspan: 5, inset: 3pt)[],
|
||||
|
||||
[32], num(495713), num(499837), num(1060), num(5184),
|
||||
[33], num(537427), num(539236), num(1258), num(3067),
|
||||
[34], num(605106), num(605098), num(26), num(18),
|
||||
|
||||
table.cell(colspan: 4, inset: 3pt)[],
|
||||
table.hline(),
|
||||
)},
|
||||
|
||||
caption: [Comparison for API methods between documentation and emulators],
|
||||
)<tab:cl-platform_apis>
|
||||
|
||||
We conclude that it can be dangerous to trust the documentation and that gathering information from the emulator or phone is the only reliable source.
|
||||
Gathering the precise list of classes and the associated bytecode is not a trivial task.
|
||||
|
||||
=== Multiple DEX files <sec:cl-collision>
|
||||
|
||||
For the application class files, Android uses its specific format called DEX: all the classes of an application are loaded from the file `classes.dex`.
|
||||
With the increasing complexity of Android applications, the need arrised to load more methods than the DEX format could support in one #dexfile.
|
||||
To solve this problem, Android started storing classes in multiple files named `classesX.dex` as illustrated by the @lst:cl-dexname that generates the filenames read by class loaders.
|
||||
Android starts loading the file `GetMultiDexClassesDexName(0)` (`classes.dex`), then `GetMultiDexClassesDexName(1)` (`classes2.dex`), and continues until finding a value `n` for which `GetMultiDexClassesDexName(n)` does not exist.
|
||||
Even if Android emits a warning message when it finds more than 100 #dexfiles, it will still load any number of #dexfiles that way.
|
||||
This change had the unintended consequence of permitting two classes with the same name but different implementations to be stored in the same `.apk` file using two #dexfiles.
|
||||
|
||||
Android explicitly performs checks that prevent several classes from using the same name inside a #dexfile.
|
||||
However, this check does not apply to multiple #dexfiles in the same `.apk` file, and a `.dex` can contain a class with a name already used by another class in another #dexfile of the application.
|
||||
Of course, such a situation should not happen when multiple #dexfiles have been generated by properly Android Studio.
|
||||
Nevertheless, for an attacker controlling the process, this issue raises the question of which class is selected when several classes sharing the same name are present in `.apk` files.
|
||||
|
||||
We found that Android loads the class whose implementation is found first when looking in the order of multiple `dexfiles`, as generated by the method `GetMultiDexClassesDexName`.
|
||||
We will show later in @sec:cl-evaltools that this choice is not the most intuitive and can lead to fool analysis tools when reversing an application.
|
||||
As a conclusion, we model both the multi-dex and delegation behaviors in the pseudo-code of @lst:cl-loading-alg.
|
||||
|
||||
#figure(
|
||||
```C++
|
||||
std::string DexFileLoader::GetMultiDexClassesDexName(size_t index) {
|
||||
return (index == 0) ?
|
||||
"classes.dex" :
|
||||
StringPrintf("classes%zu.dex", index + 1);
|
||||
}
|
||||
```,
|
||||
caption: [The method generating the .dex filenames from the AOSP]
|
||||
) <lst:cl-dexname>
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue