230 lines
16 KiB
Typst
230 lines
16 KiB
Typst
#import "../lib.typ": todo, ie, etal, num, DEX, ART, SDK, API, APK, APIs, AOSP
|
|
#import "X_var.typ": *
|
|
|
|
== Analyzing the Class Loading Process <sec:cl-loading>
|
|
|
|
For building obfuscation techniques based on the confusion of tools with class loaders, we manually studied the code of Android that handles class loading.
|
|
In this section, we report the inner workings of #ART and we focus on the specificities of class loading that can bring confusion.
|
|
Because the class loading implementation has evolved over time during the multiple iterations of the Android operating system, we mainly describe the behavior of #ART from Android version 14 (#SDK 34).
|
|
|
|
=== Class Loaders
|
|
|
|
When #ART needs to access a class, it queries a `ClassLoader` to retrieve its implementation.
|
|
Each class has a reference to the `ClassLoader` that loaded it, and this class loader is the one that will be used to load supplementary classes used by the original class.
|
|
For example in @lst:cl-expl-cl-loading, when calling `A.f()`, the #ART will load `B` with the class loader that was used to load `A`.
|
|
|
|
#figure(
|
|
```java
|
|
class A {
|
|
public static void f() {
|
|
B b = new B();
|
|
b.do_something();
|
|
}}
|
|
```,
|
|
caption: [Class instantiation],
|
|
) <lst:cl-expl-cl-loading>
|
|
|
|
This behavior has been inherited from Java and most of the core classes regarding class loaders have been kept in Android.
|
|
Nevertheless, the Android implementation has slight differences and new class loaders have been added.
|
|
For example, the java class loader `URLClassLoader` is still present in Android, but contrary to the official documentation, most of its methods have been removed or replaced by a stub that just raises an exception.
|
|
Moreover, rather than using the Java class loaders `SecureClassLoader` or `URLClassLoader`, Android has several new class loaders that inherit from `ClassLoader` and override the appropriate methods.
|
|
|
|
The left part of @fig:cl-class_loading_classes shows the different class loaders specific to Android in white and the stubs of the original Java class loaders in grey.
|
|
The main difference between the original Java class loaders and the ones used by Android is that they do not support the Java bytecode format.
|
|
Instead, the Android-specific class loaders load their classes from (many) different file formats specific to Android.
|
|
Usually, when used by a programmer, the classes are loaded from memory or from a file using the #DEX format (`.dex`).
|
|
When used directly by #ART, the classes are usually stored in an application file (`.apk`) or in an optimized format (`OAR/ODEX`).
|
|
|
|
#figure([
|
|
#image(
|
|
"figs/classloaders-crop.svg",
|
|
width: 80%,
|
|
alt: "
|
|
A box diagram. The diagram is split into two; the right section is labeled Runtime.
|
|
On the left, there are 9 boxes. 3 are gray, labeled ClassLoader, SecureClassLoader, and URLClassLoader, and the other 6 are white: BootClassLoader, BaseDexClassLoader, DexClassLoader, InMemoryDexClassLoader, PathClassLoader, and DelegateLastClassLoader.
|
|
Arrows go from SecureClassLoader, BaseDexClassLoader, and BootClassLoader to ClassLoader,
|
|
from URLClassLoader to SecureClassLoader,
|
|
from DexClassLoader, InMemoryDexClassLoader, and PathClassLoader to BaseDexClassLoader,
|
|
and from DelegateLastClassLoader to PathClassLoader.
|
|
|
|
On the runtime side, there are 5 boxes: bootClassLoader, appClassLoader (multi dex), systemClassLoader,
|
|
Specific delegator with two delegates, X.
|
|
Arrows labeled delegate go from appClassLoader, systemClassLoader, and Specific delegator to bootClassLoader, and from Specific delegator to X.
|
|
bootClassLoader, appClassLoader, and systemClassLoader are grouped in a dotted box labeled Android default behavior.
|
|
Dotted lines labeled instance go across the central demarcation from appClassLoader to PathClassLoader, from systemClassLoader to PathClassLoader, and from Specific delegator to DelegateLastClassLoader.
|
|
Another dotted line labeled instance singleton goes from bootClassLoader to BootClassLoader.
|
|
"
|
|
)
|
|
gray -- Java-based, white -- Android-based
|
|
],
|
|
caption: [The class loading hierarchy of Android]
|
|
) <fig:cl-class_loading_classes>
|
|
|
|
=== Delegation <sec:cl-delegation>
|
|
|
|
The order in which classes are loaded at runtime requires special attention.
|
|
All the specific Android class loaders (`DexClassLoader`, `InMemoryClassLoader`, etc.) have the same behavior (except `DelegateLastClassLoader`) but they handle specificities for the input format.
|
|
Each class loader has a delegate class loader, represented in the right part of @fig:cl-class_loading_classes by black plain arrows for an instance of `PathClassLoader` and an instance of `DelegateLastClassLoader` (the other class loaders also have this delegate).
|
|
This delegate is a concept specific to class loaders and has nothing to do with class inheritance.
|
|
By default, class loaders will delegate to the singleton class `BootClassLoader`, except if a specific class loader is provided when instantiating the new class loader.
|
|
When a class loader needs to load a class, except for `DelegateLastClassLoader`, it will first ask the delegate, i.e. `BootClassLoader`, and if the delegate does not find the class, the class loader will try to load the class on its own.
|
|
This behavior implements a priority and avoids redefining by error a core class of the system, for example redefining `java.lang.String` that would be loaded by a child class loader instead of its delegates.
|
|
`DelegateLastClassLoader` behaves slightly differently: it will first delegate to `BootClassLoader` then, it will check its files and finally, it will delegate to its actual delegate (given when instantiating the `DelegateLastClassLoader`).
|
|
This behavior is useful for overriding specific classes of a class loader while keeping the other classes.
|
|
A normal class loader would prioritize the classes of its delegate over its own.
|
|
|
|
#figure(
|
|
```python
|
|
def get_mutli_dex_classses_dex_name(index: int):
|
|
if index == 0:
|
|
return "classes.dex"
|
|
else:
|
|
return f"classes{index+1}.dex"
|
|
|
|
def load_class(class_name: str):
|
|
if is_platform_class(class_name):
|
|
return load_from_boot_class_loader(class_name)
|
|
else:
|
|
index = 0
|
|
dex_file = get_mutli_dex_classses_dex_name(index)
|
|
while file_exists_in_apk(dex_file) and \
|
|
not class_found_in_dex_file(class_name, dex_file):
|
|
index += 1
|
|
if file_exists_in_apk(dex_file):
|
|
return load_from_file(dex_file, class_name)
|
|
else:
|
|
raise ClassNotFoundError()
|
|
```,
|
|
caption: [Default Class Loading Algorithm for Android Applications],
|
|
) <lst:cl-loading-alg>
|
|
|
|
At runtime, Android instantiates for each application three instances of class loaders described previously: `bootClassLoader`, the unique instance of `BootClassLoader`, and two instances of `PathClassLoader`: `systemClassLoader` and `appClassLoader`.
|
|
`bootClassLoader` is responsible for loading Android *#platc*.
|
|
It is the direct delegate of the two other class loaders instantiated by Android.
|
|
`appClassLoader` points to the application `.apk` file, and is used to load the classes inside the application
|
|
`systemClassLoader` is a `PathClassLoader` pointing to `'.'`, the working directory of the application, which is `'/'` by default.
|
|
The documentation of `ClassLoader.getSystemClassLoader` reports that this class loader is the default context class loader for the main application thread.
|
|
In reality, the #platc are loaded by `bootClassLoader` and the classes from the application are loaded from `appClassLoader`.
|
|
`systemClassLoader` is never used.
|
|
|
|
In addition to the class loaders instantiated by ART when starting an application, the developer of an application can use class loaders explicitly by calling to ones from the #Asdk, or by recoding custom class loaders that inherit from the `ClassLoader` class.
|
|
At this point, modeling accurately the complete class loading algorithm becomes impossible: the developer can program any algorithm of their choice.
|
|
For this reason, this case is excluded from this chapter and we focus on the default behavior where the context class loader is the one pointing to the `.apk` file and where its delegate is `BootClassLoader`.
|
|
With such a hypothesis, the delegation process can be modeled by the pseudo-code of method `load_class` given in <lst:cl-listing3>.
|
|
|
|
In addition, it is important to distinguish the two types of #platc handled by `BootClassLoader` and that both have priority over classes from the application at runtime:
|
|
|
|
- the ones available in the *#Asdk* (normally visible in the documentation);
|
|
- the ones that are internal and that should not be used by the developer. We call them *#hidec*~@he_systematic_2023 @li_accessing_2016 (not documented).
|
|
|
|
As a preliminary conclusion, we observe that a priority exists in the class loading mechanism and that an attacker could use it to prioritize an implementation over another one.
|
|
This could mislead the reverser if they use the one that has the lowest priority.
|
|
To determine if a class is impacted by the priority given to `BootClassLoader`, we need to obtain the list of classes that are part of Android #ie the #platc.
|
|
We discuss in the next section how to obtain these classes from the emulator.
|
|
|
|
=== Determining Platform Classes
|
|
|
|
#figure(
|
|
image(
|
|
"figs/architecture_SDK-crop.svg",
|
|
width: 80%,
|
|
alt: "
|
|
On the top right, a diagram of a web browser open at https//develoer.android.com, with the webpage reading: API documentation, SDK classes, and method descriptions.
|
|
The web browser is labelled Documentation.
|
|
On the bottom right, a box with the Android Studio logo (a blue pair of compasses in front of a green robot) is labeled 'Development Environment'.
|
|
It contains two boxes: Developer classes and android.jar, and the text Dev SDK classes in bold.
|
|
An arrow labeled API access goes from Developer classes to android.jar.
|
|
On the left, a diagram of a smartphone with the Android logo (a green robot) contains two boxes: Platform classes and APK files.
|
|
Platform classes contain the text 'boot.art: framework.jar + 24 .jar = Android SDK classes + Hidden classes'.
|
|
APK file is split in two, in the top part: Developer classes + some extra classes, and on the bottom part: Multi DEX.
|
|
An arrow labeled API access goes from APK file to Platform classes.
|
|
Another arrow goes from Developer environment to APK file.
|
|
|
|
"
|
|
),
|
|
caption: [Location of #SDK classes during development and at runtime]
|
|
) <fig:cl-archisdk>
|
|
|
|
@fig:cl-archisdk shows how classes of Android are used in the development environment and at runtime.
|
|
In the development environment, Android Studio uses `android.jar` and the specific classes written by the developer.
|
|
After compilation, only the classes of the developer, and sometimes extra classes computed by Android Studio are zipped in the #APK file, using the multi-dex format.
|
|
At runtime, the application uses `BootClassLoader` to load the #platc from Android.
|
|
Until our work, previous works~@he_systematic_2023 @li_accessing_2016 considered both #Asdk and #hidec to be in the file `/system/framework/framework.jar` found in the phone itself, but we found that the classes loaded by `bootClassLoader` are not all present in `framework.jar`.
|
|
For example, He #etal~@he_systematic_2023 counted 495 thousand #APIs (fields and methods) in Android 12, based on Google documentation on restriction for non #SDK interfaces#footnote[https://developer.android.com/guide/app-compatibility/restrictions-non-sdk-interfaces].
|
|
However, when looking at the content of `framework.jar`, we only found #num(333) thousand #APIs.
|
|
Indeed, classes such as `com.android.okhttp.OkHttpClient` are loaded by `bootClassLoader`, listed by Google, but not in `framework.jar`.
|
|
|
|
For optimization purposes, classes are now loaded from `boot.art`.
|
|
This file is used to speed up the start-up time of applications: it stores a dump of the C++ objects representing the *#platc* (#Asdk and #hidec) so that they do not need to be generated each time an application starts.
|
|
Unfortunately, this format is not documented and not retro-compatible between Android versions and is thus difficult to parse.
|
|
An easier solution to investigate the #platc is to look at the `BOOTCLASSPATH` environment variable in an emulator.
|
|
This variable is used to load the classes without the `boot.art` optimization.
|
|
We found 25 `.jar` files, including `framework.jar`, in the `BOOTCLASSPATH` of the standard emulator for Android 12 (#SDK 32), 31 for Android 13 (#SDK 33), and 35 for Android 14 (#SDK 35), containing respectively a total of #num(499837), #num(539236) and #num(605098) API methods and fields.
|
|
@tab:cl-platform_apis) summarizes the discrepancies we found between Google's list and the #platc we found in Android emulators.
|
|
Note also that some methods may also be found _only_ in the documentation.
|
|
Our manual investigations suggest that the documentation is not well synchronized with the evolution of the #platc and that Google has almost solved this issue in #API 34.
|
|
|
|
|
|
#figure({
|
|
show table: set text(size: 0.80em)
|
|
table(
|
|
columns: 5,
|
|
//inset: (x: 0% + 5pt, y: 0% + 2pt),
|
|
stroke: none,
|
|
align: center+horizon,
|
|
table.hline(),
|
|
table.header(
|
|
table.cell(colspan: 5, inset: 3pt)[],
|
|
table.cell(rowspan: 2)[*SDK version*],
|
|
table.vline(end: 3),
|
|
table.vline(start: 4),
|
|
table.cell(colspan: 4)[*Number of API methods*],
|
|
[Documented], [In emulator], [Only documented], [Only in emulator],
|
|
),
|
|
table.cell(colspan: 5, inset: 3pt)[],
|
|
table.hline(),
|
|
table.cell(colspan: 5, inset: 3pt)[],
|
|
|
|
[32], num(495713), num(499837), num(1060), num(5184),
|
|
[33], num(537427), num(539236), num(1258), num(3067),
|
|
[34], num(605106), num(605098), num(26), num(18),
|
|
|
|
table.cell(colspan: 4, inset: 3pt)[],
|
|
table.hline(),
|
|
)},
|
|
|
|
caption: [Comparison for #API methods between documentation and emulators],
|
|
)<tab:cl-platform_apis>
|
|
|
|
We conclude that it can be dangerous to trust the documentation and that gathering information from the emulator or phone is the only reliable source.
|
|
Gathering the precise list of classes and the associated bytecode is not a trivial task.
|
|
|
|
=== Multiple #DEX Files <sec:cl-collision>
|
|
|
|
For the application class files, Android uses its specific format called #DEX: all the classes of an application are loaded from the file `classes.dex`.
|
|
With the increasing complexity of Android applications, the need arrised to load more methods than the #DEX format could support in one #dexfile.
|
|
To solve this problem, Android started storing classes in multiple files named `classesX.dex` as illustrated by the @lst:cl-dexname that generates the filenames read by class loaders.
|
|
Android starts loading the file `GetMultiDexClassesDexName(0)` (`classes.dex`), then `GetMultiDexClassesDexName(1)` (`classes2.dex`), and continues until finding a value `n` for which `GetMultiDexClassesDexName(n)` does not exist.
|
|
Even if Android emits a warning message when it finds more than 100 #dexfiles, it will still load any number of #dexfiles that way.
|
|
This change had the unintended consequence of permitting two classes with the same name but different implementations to be stored in the same `.apk` file using two #dexfiles.
|
|
|
|
Android explicitly performs checks that prevent several classes from using the same name inside a #dexfile.
|
|
However, this check does not apply to multiple #dexfiles in the same `.apk` file, and a `.dex` can contain a class with a name already used by another class in another #dexfile of the application.
|
|
Of course, such a situation should not happen when multiple #dexfiles have been generated by properly Android Studio.
|
|
Nevertheless, for an attacker controlling the process, this issue raises the question of which class is selected when several classes sharing the same name are present in `.apk` files.
|
|
|
|
We found that Android loads the class whose implementation is found first when looking in the order of multiple `dexfiles`, as generated by the method `GetMultiDexClassesDexName`.
|
|
We will show later in @sec:cl-evaltools that this choice is not the most intuitive and can lead to fool analysis tools when reversing an application.
|
|
As a conclusion, we model both the multi-dex and delegation behaviors in the pseudo-code of @lst:cl-loading-alg.
|
|
|
|
#figure(
|
|
```C
|
|
std::string DexFileLoader::GetMultiDexClassesDexName(size_t index) {
|
|
return (index == 0) ?
|
|
"classes.dex" :
|
|
StringPrintf("classes%zu.dex", index + 1);
|
|
}
|
|
```,
|
|
caption: [The method generating the .dex filenames from the #AOSP]
|
|
) <lst:cl-dexname>
|
|
|