wip
All checks were successful
/ test_checkout (push) Successful in 1m21s

This commit is contained in:
Jean-Marie Mineau 2025-08-19 23:27:25 +02:00
parent 5a71a9d5dd
commit 81f49f87d3
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
16 changed files with 267 additions and 202 deletions

View file

@ -1,15 +1,20 @@
#import "../lib.typ": num, todo, paragraph
#import "../lib.typ": num, todo, paragraph, SDK, APK, API, ART, DEX
#import "X_var.typ": *
== Shadow Attacks in the Wild <sec:cl-wild>
In this section, we evaluate in the wild if applications that can be found in the Play store or other markets use one of the shadow techniques.
Our goal is to explore the usage of shadow techniques in real applications.
Because we want to include malicious applications (in case such techniques would be used to hide malicious code), we selected #num(50000) applications randomly from AndroZoo~@allixAndroZooCollectingMillions2016 that appeared in 2023.
Malicious applications are spot in our dataset by using a threshold of 3 over the number of antivirus reporting an application as a malware.
Some few applications over the total cannot be retrieved or parsed leading to a final dataset of #nbapk applications.
Because we modeled the behavior of a rescent version of Android (#SDK 34), we decided to not used our dataset from @sec:rasta.
The applications in the RASTA dataset span over more than 10 years and we cannot garanties that sandow attacks behaved the same during those 10 years.
At the verry least, self-shadowing would not be possible before the introduction of multi-dex in 2014 -- about a fourth of the applications in the RASTA dataset.
Instead, sampled another dataset of recent applications.
We want to include malicious applications (in case such techniques would be used to hide malicious code) so we selected #num(50000) applications randomly from AndroZoo~@allixAndroZooCollectingMillions2016 that appeared in 2023.
Malicious applications are spot in our dataset by using a threshold of 3 over the number of VirusTotal engines reporting an application as a malware.
This number is provided by Androzoo, for scans performed between january 2023 and january 2024 depending on the application.
A few applications over the total could not be retrieved or parsed leading to a final dataset of #nbapk applications.
We automatically disassembled the applications to obtain the list of included classes.
Then, we check if any shadow attack occurs in the APK itself or with #platc of SDK 34.
Then, we check if any shadow attack occurs in the #APK itself or with #platc of #SDK 34.
=== Results
@ -76,24 +81,24 @@ comparé à SDK 32 33 34: si la shadow class match, alors match
table.cell(colspan: 9, inset: 3pt)[],
table.hline(),
)},
caption: [Shadow classes compared to SDK 34 for a dataset of #nbapk applications]
caption: [Shadow classes compared to #SDK 34 for a dataset of #nbapk applications]
) <tab:cl-shadow>
//The metadata provided by AndroZoo helps to have the flags reported by antiviruses used by VirusTotal#footnote[https://www.virustotal.com].
We report in the upper part of @tab:cl-shadow the statistics about the whole dataset and the three shadow attacks: "self" when a class shadows another one in the APK, "SDK" when a class of the SDK shadows one of the APK, and "Hidden" when a hidden class of Android shadows one of the APK.
We report in the upper part of @tab:cl-shadow the statistics about the whole dataset and the three shadow attacks: "self" when a class shadows another one in the #APK, "#SDK" when a class of the #SDK shadows one of the #APK, and "Hidden" when a hidden class of Android shadows one of the #APK.
We observe that, on average, a few classes are shadowed by another class.
Note that the median value is 0 meaning that few apps shadow a lot of classes, but the majority of apps do not shadow anything.
The number of applications shadowing a hidden API is low, which is an expected result as these classes should not be known by the developer.
We observe a consequent number of applications, 23.52%, of applications that perform SDK shadowing.
It can be explained by the fact that some classes that newly appear are embedded in the APK for end users that have old versions of Android: it is suggested by the average value of Min SDK which is 21.7 for the whole dataset: on average, an application can be run inside a smartphone with API 21, which would require to embed all new classes from 22 to 34.
The number of applications shadowing a hidden #API is low, which is an expected result as these classes should not be known by the developer.
We observe a consequent number of applications, 23.52%, of applications that perform #SDK shadowing.
It can be explained by the fact that some classes that newly appear are embedded in the #APK for end users that have old versions of Android: it is suggested by the average value of Min #SDK which is 21.7 for the whole dataset: on average, an application can be run inside a smartphone with #API 21, which would require to embed all new classes from 22 to 34.
This hypothesis about missing classes is further investigated later in this section.
In the bottom part of @tab:cl-shadow, we give the same statistics but we excluded applications that do not perform any shadowing.
For those pairs of shadow classes, we disassembled them using Apktool to perform a comparison using instructions represented in the Smali language.
For self-shadow, we compare the pair.
For the shadowing of the SDK or Hidden class, we compare the code found in the APK with implementations found in the emulator and `android.jar` of SDK 32, 33, and 34.
For the shadowing of the #SDK or Hidden class, we compare the code found in the #APK with implementations found in the emulator and `android.jar` of #SDK 32, 33, and 34.
#paragraph([Self-shadowing])[
We observe a low number of applications doing self-shadow attacks.
@ -117,22 +122,22 @@ We investigate later in @sec:cl-malware the case of malicious applications.
The remaining bars are between 0 and 5,000.
"
),
caption: [Redefined SDK classes, sorted by the first SDK they appeared in.]
caption: [Redefined #SDK classes, sorted by the first #SDK they appeared in.]
)<fig:cl-classes_by_first_sdk>
#paragraph([SDK shadowing])[
For the shadowing of SDK classes, we observe a low ratio of identical classes.
This result could lead to the wrong conclusion that developers embed malicious versions of the SDK classes, but our manual investigation shows that the difference is slight and probably due to compiler optimization.
#paragraph([#SDK shadowing])[
For the shadowing of #SDK classes, we observe a low ratio of identical classes.
This result could lead to the wrong conclusion that developers embed malicious versions of the #SDK classes, but our manual investigation shows that the difference is slight and probably due to compiler optimization.
To go further in the investigation, in @fig:cl-classes_by_first_sdk we represent these redefined classes with the following rules:
- The class is classified on the X abscissa in the figure according to the SDK it first appeared in.
- The class is counted as "green" (solid) if it first appeared in the SDK *after* the APK min SDK (retro compatibility purpose).
- The class is counted as "red" (hatched) if it first appeared in the SDK *before* the APK min SDK (which is useless for the application as the SDK version is always available).
- The class is classified on the X abscissa in the figure according to the #SDK it first appeared in.
- The class is counted as "green" (solid) if it first appeared in the #SDK *after* the #APK min #SDK (retro compatibility purpose).
- The class is counted as "red" (hatched) if it first appeared in the #SDK *before* the #APK min #SDK (which is useless for the application as the #SDK version is always available).
We observe that the majority of classes are legitimate retro-compatibility additions of classes, especially after SDK 21 (which is the average min SDK, cf. @tab:cl-shadow).
Abnormal cases are observed for classes that appeared in API versions 7 and before, 8, and 16.
@tab:cl-topsdk reports the top ten classes that shadow the SDK for the three mentioned versions.
For SDK before 7, it mainly concerns HTTP classes: for example, the class `HttpParams` is an interface, containing limited bytecode that mostly matches the class already present on the emulator (98.03% of shadowed classes are identical).
We observe that the majority of classes are legitimate retro-compatibility additions of classes, especially after #SDK 21 (which is the average min #SDK, cf. @tab:cl-shadow).
Abnormal cases are observed for classes that appeared in #API versions 7 and before, 8, and 16.
@tab:cl-topsdk reports the top ten classes that shadow the #SDK for the three mentioned versions.
For #SDK before 7, it mainly concerns HTTP classes: for example, the class `HttpParams` is an interface, containing limited bytecode that mostly matches the class already present on the emulator (98.03% of shadowed classes are identical).
`HttpConnectionParams` on the other hand differs from the platform class and we observe only 4.99% of identical classes.
Manual inspection of some applications revealed that the two main reasons are:
@ -141,11 +146,11 @@ Manual inspection of some applications revealed that the two main reasons are:
- very small changes that we found can be attributed to the compilation process (e.g. swapping registers: `v0` is used instead of `v1` and `v1` instead of `v0`), but even if we consider them different, they are very similar.
The remaining 4.99% of classes that are identical to the Android version are classes where the body of the methods is replaced by stubs that throw `RuntimeException("Stub!")`.
This code corresponds to what we found in android.jar but not the code we found in the emulator, which is surprising.
This code corresponds to what we found in `android.jar` but not the code we found in the emulator, which is surprising.
Nevertheless, we decided to count them as identical, because `android.jar` is the official jar file for developer, and stubs are replaced in the emulator: it is intended by Google developers.
Other results of @tab:cl-topsdk can be similarly discussed: either they are identical with a high ratio, or they are different because of small variations.
When substantial differences appear it is mainly because different versions of the same library have been used or an SDK class is embedded for retro-compatibility.
When substantial differences appear it is mainly because different versions of the same library have been used or an #SDK class is embedded for retro-compatibility.
]
#figure({
@ -196,17 +201,19 @@ When substantial differences appear it is mainly because different versions of t
table.cell(colspan: 3, inset: 2pt)[],
table.hline(),
)},
caption: [Shadow classes compared to SDK 34 for a dataset of #nbapk applications]
caption: [Shadow classes compared to #SDK 34 for a dataset of #nbapk applications]
) <tab:cl-topsdk>
#paragraph([Hidden shadowing])[
For applications redefining hidden classes, on average, 16.1 classes are redefined (cf bottom part of @tab:cl-shadow).
The top 3 packages whose code actually differs from the ones found in Android are `java.util.stream`, `org.ccil.cowan.tagsoup` and `org.json`:
- stream: when looking in more detail, we found that `java.util.stream` was only redefined by 6 applications, but the large number of classes redefined artificially puts the package at the top of the list. // It is explained by the fact that developers have included this library containing a lot of classes colliding with Android.
- tagsoup: `TagSoup` is a library for parsing HTML. // Developers do not know that it is part of Android as hidden classes.
- stream: when looking in more detail, we found that `java.util.stream` was only redefined by 6 applications, but the large number of classes redefined artificially puts the package at the top of the list.
It is explained by the fact that developers have included this library containing a lot of classes colliding with Android.
- tagsoup: `TagSoup` is a library for parsing HTML.
Developers do not know that it is part of Android as hidden classes.
- json: there is only one hidden class in `org.json`, redefined by #num(821) applications: `JSONObject$1`.
`org.json` is a package in Android SDK, not a hidden one.
`org.json` is a package in Android #SDK, not a hidden one.
However, `JSONObject$1` is an anonymous class not provided by `android.jar` because its class `JSONObject` is an empty stub, and thus, does not use `JSONObject$1`.
Thus, this class falls in the category of hidden #platc.
All these hidden shadow classes are libraries included by the developers who probably did not know that they were already embedded in Android.
@ -236,7 +243,7 @@ All these hidden shadow classes are libraries included by the developers who pro
// ...
}
```,
caption: [Implementation of Reflection found un classes11.dex (shadows @lst:cl-refl1)],
caption: [Implementation of Reflection found un `classes11.dex` (shadows @lst:cl-refl1)],
) <lst:cl-refl2>
#figure(
@ -258,7 +265,7 @@ All these hidden shadow classes are libraries included by the developers who pro
// ...
}
```,
caption: [Implementation of Reflection executed by ART (shadowed by @lst:cl-refl2],
caption: [Implementation of Reflection executed by #ART (shadowed by @lst:cl-refl2],
) <lst:cl-refl1>
The last column of @tab:cl-shadow shows the proportion of applications considered as malware because we arbitrarily fixed a threshold of 3 positive detections from VirusTotal reports.
@ -271,18 +278,19 @@ Additionally, we noticed multiple times internal classes from `com.google.androi
// Nom de l'app: ShareCRM, mais ca a l'air d'exister sur le store donc on va eviter un process et pas la nommer
// https://play.google.com/store/apps/details?id=com.facishare.fsplay&hl=en
The most notable case we found was an application that still exists on the Google Play Store with the same package name#footnote[SHA256: `C46A65EA1A797119CCC03C579B61C94FE8161308A3B6A8F55718D6ADAD112546`]. This application contains a self-shadow class `me.weishu.reflection.Reflection` that can be found in github, in the repository `tiann/FreeReflection`#footnote[https://github.com/tiann/FreeReflection]. This class is used to disable Android restrictions on hidden API.
The most notable case we found was an application that still exists on the Google Play Store with the same package name#footnote[SHA256: `C46A65EA1A797119CCC03C579B61C94FE8161308A3B6A8F55718D6ADAD112546`]. This application contains a self-shadow class `me.weishu.reflection.Reflection` that can be found in github, in the repository `tiann/FreeReflection`#footnote[https://github.com/tiann/FreeReflection]. This class is used to disable Android restrictions on hidden #API.
At first glance, we believed the shadowing to be done voluntarily for obfuscation purposes.
The shadow class that would be seen by a reverser is given in @lst:cl-refl2: it contains some Java bytecode performing reflection and loading a native library named "free-reflection" (the associated `.so` is missing).
The shadowed class that is really executed is summarized in @lst:cl-refl1.
It contains a more obfuscated code: a `DEX` field storing base64 encoded DEX bytecode that is later used to load some new code.
It contains a more obfuscated code: a `DEX` field storing base64 encoded #DEX bytecode that is later used to load some new code.
When looking at this new code stored in the field, we found that it does almost the same thing as the code in the shadow class.
Thus, we believe that the developer has upgraded their obfuscation techniques, replacing a native library by inline base64 encoded bytecode.
The shadow attack could be unintentional, but it strengthens the masking of the new implementation.
#v(2em)
As a conclusion, we observed that:
- SDK shadowing is performed by #shadowsdk of applications but are unintentional: these classes are embedded for retro-compatibility purpose or because the developer added a library already present in Android;
- #SDK shadowing is performed by #shadowsdk of applications but are unintentional: these classes are embedded for retro-compatibility purpose or because the developer added a library already present in Android;
- Hidden shadowing rarely occurs and is mainly due to the usage of libraries that Android already contains;
- Malware perform more self-shadowing than goodware applications, and we found a sample where self-shadowing would clearly mislead the reverser.