wip
All checks were successful
/ test_checkout (push) Successful in 1m21s

This commit is contained in:
Jean-Marie Mineau 2025-08-19 23:27:25 +02:00
parent 5a71a9d5dd
commit 81f49f87d3
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
16 changed files with 267 additions and 202 deletions

View file

@ -1,17 +1,17 @@
#import "../lib.typ": todo, ie, etal, num, DEX
#import "../lib.typ": todo, ie, etal, num, DEX, ART, SDK, API, APK, APIs, AOSP
#import "X_var.typ": *
== Analyzing the Class Loading Process <sec:cl-loading>
For building obfuscation techniques based on the confusion of tools with class loaders, we manually studied the code of Android that handles class loading.
In this section, we report the inner workings of ART and we focus on the specificities of class loading that can bring confusion.
Because the class loading implementation has evolved over time during the multiple iterations of the Android operating system, we mainly describe the behavior of ART from Android version 14 (SDK 34).
In this section, we report the inner workings of #ART and we focus on the specificities of class loading that can bring confusion.
Because the class loading implementation has evolved over time during the multiple iterations of the Android operating system, we mainly describe the behavior of #ART from Android version 14 (#SDK 34).
=== Class Loaders
When ART needs to access a class, it queries a `ClassLoader` to retrieve its implementation.
When #ART needs to access a class, it queries a `ClassLoader` to retrieve its implementation.
Each class has a reference to the `ClassLoader` that loaded it, and this class loader is the one that will be used to load supplementary classes used by the original class.
For example in @lst:cl-expl-cl-loading, when calling `A.f()`, the ART will load `B` with the class loader that was used to load `A`.
For example in @lst:cl-expl-cl-loading, when calling `A.f()`, the #ART will load `B` with the class loader that was used to load `A`.
#figure(
```java
@ -32,10 +32,9 @@ Moreover, rather than using the Java class loaders `SecureClassLoader` or `URLCl
The left part of @fig:cl-class_loading_classes shows the different class loaders specific to Android in white and the stubs of the original Java class loaders in grey.
The main difference between the original Java class loaders and the ones used by Android is that they do not support the Java bytecode format.
Instead, the Android-specific class loaders load their classes from (many) different file formats specific to Android.
Usually, when used by a programmer, the classes are loaded from memory or from a file using the DEX format (`.dex`).
When used directly by ART, the classes are usually stored in an application file (`.apk`) or in an optimized format (`OAR/ODEX`).
Usually, when used by a programmer, the classes are loaded from memory or from a file using the #DEX format (`.dex`).
When used directly by #ART, the classes are usually stored in an application file (`.apk`) or in an optimized format (`OAR/ODEX`).
#todo[Alt text for cl-class_loading_classes]
#figure([
#image(
"figs/classloaders-crop.svg",
@ -110,7 +109,7 @@ In reality, the #platc are loaded by `bootClassLoader` and the classes from the
In addition to the class loaders instantiated by ART when starting an application, the developer of an application can use class loaders explicitly by calling to ones from the #Asdk, or by recoding custom class loaders that inherit from the `ClassLoader` class.
At this point, modeling accurately the complete class loading algorithm becomes impossible: the developer can program any algorithm of their choice.
For this reason, this case is excluded from this paper and we focus on the default behavior where the context class loader is the one pointing to the `.apk` file and where its delegate is `BootClassLoader`.
For this reason, this case is excluded from this chapter and we focus on the default behavior where the context class loader is the one pointing to the `.apk` file and where its delegate is `BootClassLoader`.
With such a hypothesis, the delegation process can be modeled by the pseudo-code of method `load_class` given in <lst:cl-listing3>.
In addition, it is important to distinguish the two types of #platc handled by `BootClassLoader` and that both have priority over classes from the application at runtime:
@ -143,16 +142,16 @@ On the top right, a diagram of a web browser open at https//develoer.android.com
"
),
caption: [Location of SDK classes during development and at runtime]
caption: [Location of #SDK classes during development and at runtime]
) <fig:cl-archisdk>
@fig:cl-archisdk shows how classes of Android are used in the development environment and at runtime.
In the development environment, Android Studio uses `android.jar` and the specific classes written by the developer.
After compilation, only the classes of the developer, and sometimes extra classes computed by Android Studio are zipped in the APK file, using the multi-dex format.
After compilation, only the classes of the developer, and sometimes extra classes computed by Android Studio are zipped in the #APK file, using the multi-dex format.
At runtime, the application uses `BootClassLoader` to load the #platc from Android.
Until our work, previous works~@he_systematic_2023 @li_accessing_2016 considered both #Asdk and #hidec to be in the file `/system/framework/framework.jar` found in the phone itself, but we found that the classes loaded by `bootClassLoader` are not all present in `framework.jar`.
For example, He #etal~@he_systematic_2023 counted 495 thousand APIs (fields and methods) in Android 12, based on Google documentation on restriction for non SDK interfaces#footnote[https://developer.android.com/guide/app-compatibility/restrictions-non-sdk-interfaces].
However, when looking at the content of `framework.jar`, we only found #num(333) thousand APIs.
For example, He #etal~@he_systematic_2023 counted 495 thousand #APIs (fields and methods) in Android 12, based on Google documentation on restriction for non #SDK interfaces#footnote[https://developer.android.com/guide/app-compatibility/restrictions-non-sdk-interfaces].
However, when looking at the content of `framework.jar`, we only found #num(333) thousand #APIs.
Indeed, classes such as `com.android.okhttp.OkHttpClient` are loaded by `bootClassLoader`, listed by Google, but not in `framework.jar`.
For optimization purposes, classes are now loaded from `boot.art`.
@ -160,10 +159,10 @@ This file is used to speed up the start-up time of applications: it stores a dum
Unfortunately, this format is not documented and not retro-compatible between Android versions and is thus difficult to parse.
An easier solution to investigate the #platc is to look at the `BOOTCLASSPATH` environment variable in an emulator.
This variable is used to load the classes without the `boot.art` optimization.
We found 25 `.jar` files, including `framework.jar`, in the `BOOTCLASSPATH` of the standard emulator for Android 12 (SDK 32), 31 for Android 13 (SDK 33), and 35 for Android 14 (SDK 35), containing respectively a total of #num(499837), #num(539236) and #num(605098) API methods and fields.
We found 25 `.jar` files, including `framework.jar`, in the `BOOTCLASSPATH` of the standard emulator for Android 12 (#SDK 32), 31 for Android 13 (#SDK 33), and 35 for Android 14 (#SDK 35), containing respectively a total of #num(499837), #num(539236) and #num(605098) API methods and fields.
@tab:cl-platform_apis) summarizes the discrepancies we found between Google's list and the #platc we found in Android emulators.
Note also that some methods may also be found _only_ in the documentation.
Our manual investigations suggest that the documentation is not well synchronized with the evolution of the #platc and that Google has almost solved this issue in API 34.
Our manual investigations suggest that the documentation is not well synchronized with the evolution of the #platc and that Google has almost solved this issue in #API 34.
#figure({
@ -194,7 +193,7 @@ Our manual investigations suggest that the documentation is not well synchronize
table.hline(),
)},
caption: [Comparison for API methods between documentation and emulators],
caption: [Comparison for #API methods between documentation and emulators],
)<tab:cl-platform_apis>
We conclude that it can be dangerous to trust the documentation and that gathering information from the emulator or phone is the only reliable source.
@ -202,8 +201,8 @@ Gathering the precise list of classes and the associated bytecode is not a trivi
=== Multiple #DEX Files <sec:cl-collision>
For the application class files, Android uses its specific format called DEX: all the classes of an application are loaded from the file `classes.dex`.
With the increasing complexity of Android applications, the need arrised to load more methods than the DEX format could support in one #dexfile.
For the application class files, Android uses its specific format called #DEX: all the classes of an application are loaded from the file `classes.dex`.
With the increasing complexity of Android applications, the need arrised to load more methods than the #DEX format could support in one #dexfile.
To solve this problem, Android started storing classes in multiple files named `classesX.dex` as illustrated by the @lst:cl-dexname that generates the filenames read by class loaders.
Android starts loading the file `GetMultiDexClassesDexName(0)` (`classes.dex`), then `GetMultiDexClassesDexName(1)` (`classes2.dex`), and continues until finding a value `n` for which `GetMultiDexClassesDexName(n)` does not exist.
Even if Android emits a warning message when it finds more than 100 #dexfiles, it will still load any number of #dexfiles that way.
@ -219,13 +218,13 @@ We will show later in @sec:cl-evaltools that this choice is not the most intuiti
As a conclusion, we model both the multi-dex and delegation behaviors in the pseudo-code of @lst:cl-loading-alg.
#figure(
```C++
```C
std::string DexFileLoader::GetMultiDexClassesDexName(size_t index) {
return (index == 0) ?
"classes.dex" :
StringPrintf("classes%zu.dex", index + 1);
}
```,
caption: [The method generating the .dex filenames from the AOSP]
caption: [The method generating the .dex filenames from the #AOSP]
) <lst:cl-dexname>