add classloader paper

This commit is contained in:
Jean-Marie Mineau 2025-06-24 20:34:34 +02:00
parent c5e119e877
commit dd86422fd3
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
12 changed files with 1704 additions and 3 deletions

View file

@ -0,0 +1,323 @@
#import "../lib.typ": num, todo
#import "X_var.typ": *
== Shadow attacks in the wild <sec:cl-wild>
In this section, we evaluate in the wild if applications that can be found in the Play store or other markets use one of the shadow techniques.
Our goal is to explore the usage of shadow techniques in real applications.
Because we want to include malicious applications (in case such techniques would be used to hide malicious code), we selected #num(50000) applications randomly from AndroZoo@allixAndroZooCollectingMillions2016 that appeared in 2023.
Malicious applications are spot in our dataset by using a threshold of 3 over the number of antivirus reporting an application as a malware.
Some few applications over the total cannot be retrieved or parsed leading to a final dataset of #nbapk applications.
We automatically disassembled the applications to obtain the list of included classes.
Then, we check if any shadow attack occurs in the APK itself or with #platc of SDK 34.
=== Results
/*
id code
on prend les classes des platform classes et
comparé à SDK 32 33 34: si la shadow class match, alors match
*/
#todo[cl-shadow]
#figure({
show table: set text(size: 0.80em)
table(
columns: 9,
stroke: none,
align: center+horizon,
inset: (x: 0% + 5pt, y: 0% + 2pt),
table.hline(),
table.header(
table.cell(colspan: 9, inset: 3pt)[],
[],
table.vline(end: 3),
table.vline(start: 4),
table.cell(colspan: 3)[*Number of apps*],
table.vline(end: 3),
table.vline(start: 4),
table.cell(colspan: 4)[*Average*],
table.vline(end: 3),
table.vline(start: 4),
table.cell(rowspan: 2)[*Identical Code*],
[],
[], [*%*], [*% malware*],
[*Shadow classes*], [*Median*], [*Target SDK*], [*Min SDK*],
),
table.cell(colspan: 9, inset: 3pt)[],
table.hline(),
table.cell(colspan: 9)[For all applications of the dataset],
table.hline(),
table.cell(colspan: 9, inset: 3pt)[],
..scan_50k.map(e => (
[*#e.method*],
num(e.nbapp), [#e.ratioapp%], [#e.ratiomal%],
num(e.avgshadow), num(e.median), num(e.avgtargetsdk), num(e.avgminsdk),
[#e.id%]
)).flatten(),
table.cell(colspan: 9, inset: 3pt)[],
table.hline(),
table.cell(colspan: 9)[For applications with at least 1 shadow case],
table.hline(),
table.cell(colspan: 9, inset: 3pt)[],
..scan_only_shadow.map(e => (
[*#e.method*],
num(e.nbapp), [#e.ratioapp%], [#e.ratiomal%],
num(e.avgshadow), num(e.median), num(e.avgtargetsdk), num(e.avgminsdk),
[#e.id%]
)).flatten(),
table.cell(colspan: 9, inset: 3pt)[],
table.hline(),
)},
caption: [Shadow classes compared to SDK 34 for a dataset of #nbapk applications]
) <tab:cl-shadow>
//The metadata provided by AndroZoo helps to have the flags reported by antiviruses used by VirusTotal#footnote[https://www.virustotal.com].
We report in the upper part of @tab:cl-shadow the statistics about the whole dataset and the three shadow attacks: "self" when a class shadows another one in the APK, "SDK" when a class of the SDK shadows one of the APK, and "Hidden" when a hidden class of Android shadows one of the APK.
We observe that, on average, a few classes are shadowed by another class.
Note that the median value is 0 meaning that few apps shadow a lot of classes, but the majority of apps do not shadow anything.
The number of applications shadowing a hidden API is low, which is an expected result as these classes should not be known by the developer.
We observe a consequent number of applications, 23.52%, of applications that perform SDK shadowing.
It can be explained by the fact that some classes that newly appear are embedded in the APK for end users that have old versions of Android: it is suggested by the average value of Min SDK which is 21.7 for the whole dataset: on average, an application can be run inside a smartphone with API 21, which would require to embed all new classes from 22 to 34.
This hypothesis about missing classes is further investigated later in this section.
In the bottom part of @tab:cl-shadow, we give the same statistics but we excluded applications that do not perform any shadowing.
For those pairs of shadow classes, we disassembled them using Apktool to perform a comparison using instructions represented in the Smali language.
For self-shadow, we compare the pair.
For the shadowing of the SDK or Hidden class, we compare the code found in the APK with implementations found in the emulator and `android.jar` of SDK 32, 33, and 34.
_Self-shadowing_
We observe a low number of applications doing self-shadow attacks.
For each class that is shadowed, we compared its bytecode with the shadowed one.
We observe that 74.8% are identical which suggests that the compilation process embeds the same class multiple times but makes variations in headers or metadata values.
We investigate later in @sec:cl-malware the case of malicious applications.
#figure(
image(
"figs/redef_sdk_relative_min_sdk.svg",
width: 100%,
alt: ""
),
caption: [Redefined SDK classes, sorted by the first SDK they appeared in.]
)<fig:cl-classes_by_first_sdk>
_SDK shadowing_
For the shadowing of SDK classes, we observe a low ratio of identical classes.
This result could lead to the wrong conclusion that developers embed malicious versions of the SDK classes, but our manual investigation shows that the difference is slight and probably due to compiler optimization.
To go further in the investigation, in @fig:cl-classes_by_first_sdk we represent these redefined classes with the following rules:
- The class is classified on the X abscissa in the figure according to the SDK it first appeared in.
- The class is counted as "green" (solid) if it first appeared in the SDK *after* the APK min SDK (retro compatibility purpose).
- The class is counted as "red" (hatched) if it first appeared in the SDK *before* the APK min SDK (which is useless for the application as the SDK version is always available).
We observe that the majority of classes are legitimate retro-compatibility additions of classes, especially after SDK 21 (which is the average min SDK, cf. @tab:cl-shadow).
Abnormal cases are observed for classes that appeared in API versions 7 and before, 8, and 16.
@tab:cl-topsdk reports the top ten classes that shadow the SDK for the three mentioned versions.
For SDK before 7, it mainly concerns HTTP classes: for example, the class `HttpParams` is an interface, containing limited bytecode that mostly matches the class already present on the emulator (98.03% of shadowed classes are identical).
`HttpConnectionParams` on the other hand differs from the platform class and we observe only 4.99% of identical classes.
Manual inspection of some applications revealed that the two main reasons are:
- instead of checking if the methods attributes are null inline like Android does, applications use the method `org.apache.http.util.Args.notNull()`. According to comments in the source code of Android#footnote[https://cs.android.com/android/platform/superproject/main/+/main:frameworks/base/core/java/org/apache/http/params/HttpConnectionParams.java;drc=3bdd327f8532a79b83f575cc62e8eb09a1f93f3d?], the class was forked in 2007 from Apache 'httpcomponents' project. Looking at the history of the project, the use of `Args.notNull()` was introduced in 2012#footnote[https://github.com/apache/httpcomponents-core/commit/9104a92ea79e338d876b1b60f5cd2b243ba7069f?]. This shows that applications are embedding code from more recent version of this library without realizing their version will not be the used one.
- very small changes that we found can be attributed to the compilation process (e.g. swapping registers: `v0` is used instead of `v1` and `v1` instead of `v0`), but even if we consider them different, they are very similar.
The remaining 4.99% of classes that are identical to the Android version are classes where the body of the methods is replaced by stubs that throw `RuntimeException("Stub!")`.
This code corresponds to what we found in android.jar but not the code we found in the emulator, which is surprising.
Nevertheless, we decided to count them as identical, because `android.jar` is the official jar file for developer, and stubs are replaced in the emulator: it is intended by Google developers.
Other results of @tab:cl-topsdk can be similarly discussed: either they are identical with a high ratio, or they are different because of small variations.
When substantial differences appear it is mainly because different versions of the same library have been used or an SDK class is embedded for retro-compatibility.
#todo[cl-topsdk]
#figure({
show table: set text(size: 0.80em)
table(
columns: 3,
stroke: none,
align: (left, right, right),
inset: (x: 0% + 5pt, y: 0% + 2pt),
table.hline(),
table.header(
table.cell(colspan: 3, inset: 2pt)[],
[*Class*], [*Occurrences*], [*Identical ratio*],
),
table.cell(colspan: 3, inset: 2pt)[],
table.hline(),
table.cell(colspan: 3, inset: 2pt)[],
[redefined for SDK $<=$ 7], [], [],
table.hline(),
table.cell(colspan: 3, inset: 2pt)[],
..redef_sdk_7minus.map(e => (
raw(e.class), num(e.occ), [#e.idper%],
)).flatten(),
table.cell(colspan: 3, inset: 2pt)[],
table.hline(),
table.cell(colspan: 3, inset: 2pt)[],
[redefined for SDK $=$ 8], [], [],
table.cell(colspan: 3, inset: 2pt)[],
table.hline(),
table.cell(colspan: 3, inset: 2pt)[],
..redef_sdk_8.map(e => (
raw(e.class), num(e.occ), [#e.idper%],
)).flatten(),
table.cell(colspan: 3, inset: 2pt)[],
table.hline(),
table.cell(colspan: 3, inset: 2pt)[],
[redefined for SDK $=$ 16], [], [],
table.cell(colspan: 3, inset: 2pt)[],
table.hline(),
table.cell(colspan: 3, inset: 2pt)[],
..redef_sdk_16.map(e => (
raw(e.class), num(e.occ), [#e.idper%],
)).flatten(),
table.cell(colspan: 3, inset: 2pt)[],
table.hline(),
)},
caption: [Shadow classes compared to SDK 34 for a dataset of #nbapk applications]
) <tab:cl-topsdk>
/*
\catcode`\$=12% deactivate $ sign // WORKAROUND
\begin{table}[tb]
\caption{Top 10 of shadow classes of the SDK}
\label{tab:topsdk}
\footnotesize
\begin{tabular}{lrr}
\toprule
\bf Class & \bf occurrences & \bf Identical ratio \\
\midrule
\footnotesize redefined for SDK <= 7 & & \\
\midrule
\csvreader[
late after line = \\,
%separator=semicolon,
head to column names,
]{redef_sdk_7minus.csv}{}{%
\class & \mynums{\occ} & \idper \%
}%
\midrule
\footnotesize redefined for SDK = 8 & & \\
\midrule
\csvreader[
late after line = \\,
%separator=semicolon,
head to column names,
]{redef_sdk_8.csv}{}{%
\class & \mynums{\occ} & \idper \%
}%
\midrule
\footnotesize redefined for SDK = 16 & & \\
\midrule
\csvreader[
late after line = \\,
%separator=semicolon,
head to column names,
respect dollar = false, % NOT WORKING :-(((((((((( https://tex.stackexchange.com/questions/486250/csvsimple-respect-dollar-not-working
]{redef_sdk_16.csv}{}{%
\class & \mynums{\occ} & \idper \%
}%
\bottomrule
\end{tabular}
\end{table}
\catcode`\$=3% reactivate $ sign // WORKAROUND
*/
_Hidden shadowing_
For applications redefining hidden classes, on average, 16.1 classes are redefined (cf bottom part of @tab:cl-shadow).
The top 3 packages whose code actually differs from the ones found in Android are `java.util.stream`, `org.ccil.cowan.tagsoup` and `org.json`:
- stream: when looking in more detail, we found that `java.util.stream` was only redefined by 6 applications, but the large number of classes redefined artificially puts the package at the top of the list. // It is explained by the fact that developers have included this library containing a lot of classes colliding with Android.
- tagsoup: `TagSoup` is a library for parsing HTML. // Developers do not know that it is part of Android as hidden classes.
- json: there is only one hidden class in `org.json`, redefined by #num(821) applications: `JSONObject$1`.
`org.json` is a package in Android SDK, not a hidden one.
However, `JSONObject$1` is an anonymous class not provided by `android.jar` because its class `JSONObject` is an empty stub, and thus, does not use `JSONObject$1`.
Thus, this class falls in the category of hidden #platc.
All these hidden shadow classes are libraries included by the developers who probably did not know that they were already embedded in Android.
=== Shadowing in malware applications <sec:cl-malware>
#figure(
```java
public class Reflection {
private static final int ERROR_SET_APPLICATION_FAILED = -20;
private static final String TAG = "Reflection";
// ...
static {
try {
Method declaredMethod = Class.class.getDeclaredMethod("forName", String.class);
Method declaredMethod2 = Class.class.getDeclaredMethod("getDeclaredMethod", String.class, Class[].class);
Class cls = (Class) declaredMethod.invoke(null, "dalvik.system.VMRuntime");
Method method = (Method) declaredMethod2.invoke(cls, "getRuntime", null);
setHiddenApiExemptions = (Method) declaredMethod2.invoke(cls, "setHiddenApiExemptions", new Class[]{String[].class});
sVmRuntime = method.invoke(null, new Object[0]);
} catch (Throwable th) { Log.e(TAG, "reflect bootstrap failed:", th); }
System.loadLibrary("free-reflection");
// ...
}
// ...
}
```,
caption: [Implementation of Reflection found un classes11.dex (shadows @lst:cl-refl1)],
) <lst:cl-refl2>
#figure(
```java
public class Reflection {
private static final String DEX = "ZGV4CjAzNQCl4EprGS2pXI/v3OwlBrlfRnX5rmkKVdN0CwAAcA ... AoAAA==";
private static final String TAG = "Reflection";
private static native int unsealNative(int i);
public static int unseal(Context context) {
return (Build.VERSION.SDK_INT < 28 || BootstrapClass.exemptAll() || unsealByDexFile(context)) ? 0 : -1;
}
private static boolean unsealByDexFile(Context context) {
// Decode DEX from base64 and load it as bytecode.
// ...
}
// ...
}
```,
caption: [Implementation of Reflection executed by ART (shadowed by @lst:cl-refl2],
) <lst:cl-refl1>
The last column of @tab:cl-shadow shows the proportion of applications considered as malware because we arbitrarily fixed a threshold of 3 positive detections from VirusTotal reports.
For the whole dataset, we have 0.53% of applications considered as malware.
We can see that an application that uses self-shadowing is 10 times more likely to be a malware, when the proportion of malware among application shadowing #platc is the same as in the rest of the dataset.
Thus, we manually reversed self-shadowing malware, and found that the self-shadowing does not look to be voluntary.
The colliding classes are often the same implementation, occasionally with minor differences, like different versions of a library.
Additionally, we noticed multiple times internal classes from `com.google.android.gms.ads` colliding with each other, but we believe that it is due to bad processing during the compilation of the application.
// Nom de l'app: ShareCRM, mais ca a l'air d'exister sur le store donc on va eviter un process et pas la nommer
// https://play.google.com/store/apps/details?id=com.facishare.fsplay&hl=en
The most notable case we found was an application that still exists on the Google Play Store with the same package name#footnote[SHA256: `C46A65EA1A797119CCC03C579B61C94FE8161308A3B6A8F55718D6ADAD112546`]. This application contains a self-shadow class `me.weishu.reflection.Reflection` that can be found in github, in the repository `tiann/FreeReflection`#footnote[https://github.com/tiann/FreeReflection]. This class is used to disable Android restrictions on hidden API.
At first glance, we believed the shadowing to be done voluntarily for obfuscation purposes.
The shadow class that would be seen by a reverser is given in @lst:cl-refl2: it contains some Java bytecode performing reflection and loading a native library named "free-reflection" (the associated `.so` is missing).
The shadowed class that is really executed is summarized in @lst:cl-refl1.
It contains a more obfuscated code: a `DEX` field storing base64 encoded DEX bytecode that is later used to load some new code.
When looking at this new code stored in the field, we found that it does almost the same thing as the code in the shadow class.
Thus, we believe that the developer has upgraded their obfuscation techniques, replacing a native library by inline base64 encoded bytecode.
The shadow attack could be unintentional, but it strengthens the masking of the new implementation.
As a conclusion, we observed that:
- SDK shadowing is performed by #shadowsdk of applications but are unintentional: these classes are embedded for retro-compatibility purpose or because the developer added a library already present in Android;
- Hidden shadowing rarely occurs and is mainly due to the usage of libraries that Android already contains;
- Malware perform more self-shadowing than goodware applications, and we found a sample where self-shadowing would clearly mislead the reverser.