This commit is contained in:
parent
c9752714db
commit
f23390279c
7 changed files with 177 additions and 182 deletions
|
@ -3,16 +3,16 @@
|
|||
|
||||
== Shadow Attacks in the Wild <sec:cl-wild>
|
||||
|
||||
In this section, we evaluate in the wild if applications that can be found in the Play store or other markets use one of the shadow techniques.
|
||||
In this section, we evaluate in the wild if applications that can be found in the Play Store or other markets use one of the shadow techniques.
|
||||
Our goal is to explore the usage of shadow techniques in real applications.
|
||||
Because we modeled the behavior of a rescent version of Android (#SDK 34), we decided to not used our dataset from @sec:rasta.
|
||||
The applications in the RASTA dataset span over more than 10 years and we cannot garanties that sandow attacks behaved the same during those 10 years.
|
||||
At the verry least, self-shadowing would not be possible before the introduction of multi-dex in 2014 -- about a fourth of the applications in the RASTA dataset.
|
||||
Instead, sampled another dataset of recent applications.
|
||||
We want to include malicious applications (in case such techniques would be used to hide malicious code) so we selected #num(50000) applications randomly from AndroZoo~@allixAndroZooCollectingMillions2016 that appeared in 2023.
|
||||
Malicious applications are spot in our dataset by using a threshold of 3 over the number of VirusTotal engines reporting an application as a malware.
|
||||
This number is provided by Androzoo, for scans performed between january 2023 and january 2024 depending on the application.
|
||||
A few applications over the total could not be retrieved or parsed leading to a final dataset of #nbapk applications.
|
||||
Because we modelled the behaviour of a recent version of Android (#SDK 34), we decided not to use our dataset from @sec:rasta.
|
||||
The applications in the RASTA dataset span over more than 10 years, and we cannot guarantee that sandow attacks behaved the same during those 10 years.
|
||||
At the very least, self-shadowing would not be possible before the introduction of multi-dex in 2014 -- about a fourth of the applications in the RASTA dataset.
|
||||
Instead, we sampled another dataset of recent applications.
|
||||
This way, we can also include malicious applications (in case such techniques would be used to hide malicious code), so we selected #num(50000) applications randomly from AndroZoo~@allixAndroZooCollectingMillions2016 that appeared in 2023.
|
||||
Malicious applications are spotted in our dataset by using a threshold of 3 over the number of VirusTotal engines reporting an application as malware.
|
||||
This number is provided by Androzoo for scans performed between January 2023 and January 2024, depending on the application.
|
||||
A few applications over the total could not be retrieved or parsed, leading to a final dataset of #nbapk applications.
|
||||
We automatically disassembled the applications to obtain the list of included classes.
|
||||
Then, we check if any shadow attack occurs in the #APK itself or with #platc of #SDK 34.
|
||||
|
||||
|
@ -24,7 +24,6 @@ on prend les classes des platform classes et
|
|||
comparé à SDK 32 33 34: si la shadow class match, alors match
|
||||
*/
|
||||
|
||||
|
||||
#todo[cl-shadow]
|
||||
#figure({
|
||||
show table: set text(size: 0.80em)
|
||||
|
@ -49,8 +48,7 @@ comparé à SDK 32 33 34: si la shadow class match, alors match
|
|||
|
||||
[],
|
||||
[], [*%*], [*% malware*],
|
||||
[*Shadow classes*], [*Median*], [*Target SDK*], [*Min SDK*],
|
||||
|
||||
[*Shadow classes*], [*Median*], [*Target #SDK*], [*Min #SDK*],
|
||||
),
|
||||
table.cell(colspan: 9, inset: 3pt)[],
|
||||
table.hline(),
|
||||
|
@ -87,23 +85,23 @@ comparé à SDK 32 33 34: si la shadow class match, alors match
|
|||
//The metadata provided by AndroZoo helps to have the flags reported by antiviruses used by VirusTotal#footnote[https://www.virustotal.com].
|
||||
|
||||
|
||||
We report in the upper part of @tab:cl-shadow the statistics about the whole dataset and the three shadow attacks: "self" when a class shadows another one in the #APK, "#SDK" when a class of the #SDK shadows one of the #APK, and "Hidden" when a hidden class of Android shadows one of the #APK.
|
||||
We report in the upper part of @tab:cl-shadow the statistics about the whole dataset and the three shadow attacks: "self" when a class shadows another one in the #APK, "#SDK" when a class of the #SDK shadows one of the #APK, and "Hidden" when a hidden class of Android shadows one of the #APK.
|
||||
We observe that, on average, a few classes are shadowed by another class.
|
||||
Note that the median value is 0 meaning that few apps shadow a lot of classes, but the majority of apps do not shadow anything.
|
||||
Note that the median value is 0, meaning that few apps shadow a lot of classes, but the majority of apps do not shadow anything.
|
||||
The number of applications shadowing a hidden #API is low, which is an expected result as these classes should not be known by the developer.
|
||||
We observe a consequent number of applications, 23.52%, of applications that perform #SDK shadowing.
|
||||
It can be explained by the fact that some classes that newly appear are embedded in the #APK for end users that have old versions of Android: it is suggested by the average value of Min #SDK which is 21.7 for the whole dataset: on average, an application can be run inside a smartphone with #API 21, which would require to embed all new classes from 22 to 34.
|
||||
This hypothesis about missing classes is further investigated later in this section.
|
||||
|
||||
In the bottom part of @tab:cl-shadow, we give the same statistics but we excluded applications that do not perform any shadowing.
|
||||
In the bottom part of @tab:cl-shadow, we give the same statistics, but we excluded applications that do not perform any shadowing.
|
||||
For those pairs of shadow classes, we disassembled them using Apktool to perform a comparison using instructions represented in the Smali language.
|
||||
For self-shadow, we compare the pair.
|
||||
For the shadowing of the #SDK or Hidden class, we compare the code found in the #APK with implementations found in the emulator and `android.jar` of #SDK 32, 33, and 34.
|
||||
|
||||
#paragraph([Self-shadowing])[
|
||||
We observe a low number of applications doing self-shadow attacks.
|
||||
For each class that is shadowed, we compared its bytecode with the shadowed one.
|
||||
We observe that 74.8% are identical which suggests that the compilation process embeds the same class multiple times but makes variations in headers or metadata values.
|
||||
For each class that is shadowed, we compared its bytecode with the shadowed one (we compared the Smali instructions generated by Apktool for each method).
|
||||
We observe that 74.8% are identical, which suggests that the compilation process embeds the same class multiple times but makes variations in headers or metadata values.
|
||||
We investigate later in @sec:cl-malware the case of malicious applications.
|
||||
]
|
||||
|
||||
|
@ -122,13 +120,13 @@ We investigate later in @sec:cl-malware the case of malicious applications.
|
|||
The remaining bars are between 0 and 5,000.
|
||||
"
|
||||
),
|
||||
caption: [Redefined #SDK classes, sorted by the first #SDK they appeared in.]
|
||||
caption: [Redefined #SDK classes, sorted by the first #SDK they appeared in]
|
||||
)<fig:cl-classes_by_first_sdk>
|
||||
|
||||
#paragraph([#SDK shadowing])[
|
||||
For the shadowing of #SDK classes, we observe a low ratio of identical classes.
|
||||
This result could lead to the wrong conclusion that developers embed malicious versions of the #SDK classes, but our manual investigation shows that the difference is slight and probably due to compiler optimization.
|
||||
To go further in the investigation, in @fig:cl-classes_by_first_sdk we represent these redefined classes with the following rules:
|
||||
This result could lead to the wrong conclusion that developers embed malicious versions of the #SDK classes, but our manual investigation shows that the difference is slight and probably due to compiler optimisation.
|
||||
To go further in the investigation, in @fig:cl-classes_by_first_sdk, we represent these redefined classes with the following rules:
|
||||
|
||||
- The class is classified on the X abscissa in the figure according to the #SDK it first appeared in.
|
||||
- The class is counted as "green" (solid) if it first appeared in the #SDK *after* the #APK min #SDK (retro compatibility purpose).
|
||||
|
@ -138,19 +136,19 @@ We observe that the majority of classes are legitimate retro-compatibility addit
|
|||
Abnormal cases are observed for classes that appeared in #API versions 7 and before, 8, and 16.
|
||||
@tab:cl-topsdk reports the top ten classes that shadow the #SDK for the three mentioned versions.
|
||||
For #SDK before 7, it mainly concerns HTTP classes: for example, the class `HttpParams` is an interface, containing limited bytecode that mostly matches the class already present on the emulator (98.03% of shadowed classes are identical).
|
||||
`HttpConnectionParams` on the other hand differs from the platform class and we observe only 4.99% of identical classes.
|
||||
`HttpConnectionParams`, on the other hand, differs from the platform class, and we observe only 4.99% of identical classes.
|
||||
Manual inspection of some applications revealed that the two main reasons are:
|
||||
|
||||
|
||||
- instead of checking if the methods attributes are null inline like Android does, applications use the method `org.apache.http.util.Args.notNull()`. According to comments in the source code of Android#footnote[https://cs.android.com/android/platform/superproject/main/+/main:frameworks/base/core/java/org/apache/http/params/HttpConnectionParams.java;drc=3bdd327f8532a79b83f575cc62e8eb09a1f93f3d?], the class was forked in 2007 from Apache 'httpcomponents' project. Looking at the history of the project, the use of `Args.notNull()` was introduced in 2012#footnote[https://github.com/apache/httpcomponents-core/commit/9104a92ea79e338d876b1b60f5cd2b243ba7069f?]. This shows that applications are embedding code from more recent version of this library without realizing their version will not be the used one.
|
||||
- very small changes that we found can be attributed to the compilation process (e.g. swapping registers: `v0` is used instead of `v1` and `v1` instead of `v0`), but even if we consider them different, they are very similar.
|
||||
- Instead of checking if the method's attributes are null inline, like Android does, applications use the method `org.apache.http.util.Args.notNull()`. According to comments in the source code of Android#footnote[https://cs.android.com/android/platform/superproject/main/+/main:frameworks/base/core/java/org/apache/http/params/HttpConnectionParams.java;drc=3bdd327f8532a79b83f575cc62e8eb09a1f93f3d?], the class was forked in 2007 from the Apache 'httpcomponents' project. Looking at the history of the project, the use of `Args.notNull()` was introduced in 2012#footnote[https://github.com/apache/httpcomponents-core/commit/9104a92ea79e338d876b1b60f5cd2b243ba7069f?]. This shows that applications are embedding code from more recent versions of this library without realising their version will not be the one used.
|
||||
- Very small changes that we found can be attributed to the compilation process (e.g. swapping registers: `v0` is used instead of `v1` and `v1` instead of `v0`), but even if we consider them different, they are very similar.
|
||||
|
||||
The remaining 4.99% of classes that are identical to the Android version are classes where the body of the methods is replaced by stubs that throw `RuntimeException("Stub!")`.
|
||||
This code corresponds to what we found in `android.jar` but not the code we found in the emulator, which is surprising.
|
||||
Nevertheless, we decided to count them as identical, because `android.jar` is the official jar file for developer, and stubs are replaced in the emulator: it is intended by Google developers.
|
||||
This code corresponds to what we found in `android.jar`, but not the code we found in the emulator, which is surprising.
|
||||
Nevertheless, we decided to count them as identical, because `android.jar` is the official jar file for developers, and stubs are replaced in the emulator: it is intended by Google developers.
|
||||
|
||||
Other results of @tab:cl-topsdk can be similarly discussed: either they are identical with a high ratio, or they are different because of small variations.
|
||||
When substantial differences appear it is mainly because different versions of the same library have been used or an #SDK class is embedded for retro-compatibility.
|
||||
When substantial differences appear, it is mainly because different versions of the same library have been used or an #SDK class is embedded for retro-compatibility.
|
||||
]
|
||||
|
||||
#figure({
|
||||
|
@ -168,7 +166,7 @@ When substantial differences appear it is mainly because different versions of t
|
|||
table.cell(colspan: 3, inset: 2pt)[],
|
||||
table.hline(),
|
||||
table.cell(colspan: 3, inset: 2pt)[],
|
||||
[redefined for SDK $<=$ 7], [], [],
|
||||
[redefined for #SDK $<=$ 7], [], [],
|
||||
table.hline(),
|
||||
table.cell(colspan: 3, inset: 2pt)[],
|
||||
..redef_sdk_7minus.map(e => (
|
||||
|
@ -178,7 +176,7 @@ When substantial differences appear it is mainly because different versions of t
|
|||
table.cell(colspan: 3, inset: 2pt)[],
|
||||
table.hline(),
|
||||
table.cell(colspan: 3, inset: 2pt)[],
|
||||
[redefined for SDK $=$ 8], [], [],
|
||||
[redefined for #SDK $=$ 8], [], [],
|
||||
table.cell(colspan: 3, inset: 2pt)[],
|
||||
table.hline(),
|
||||
table.cell(colspan: 3, inset: 2pt)[],
|
||||
|
@ -189,7 +187,7 @@ When substantial differences appear it is mainly because different versions of t
|
|||
table.cell(colspan: 3, inset: 2pt)[],
|
||||
table.hline(),
|
||||
table.cell(colspan: 3, inset: 2pt)[],
|
||||
[redefined for SDK $=$ 16], [], [],
|
||||
[redefined for #SDK $=$ 16], [], [],
|
||||
table.cell(colspan: 3, inset: 2pt)[],
|
||||
table.hline(),
|
||||
table.cell(colspan: 3, inset: 2pt)[],
|
||||
|
@ -209,7 +207,7 @@ For applications redefining hidden classes, on average, 16.1 classes are redefin
|
|||
The top 3 packages whose code actually differs from the ones found in Android are `java.util.stream`, `org.ccil.cowan.tagsoup` and `org.json`:
|
||||
|
||||
- stream: when looking in more detail, we found that `java.util.stream` was only redefined by 6 applications, but the large number of classes redefined artificially puts the package at the top of the list.
|
||||
It is explained by the fact that developers have included this library containing a lot of classes colliding with Android.
|
||||
It is explained by the fact that developers have included this library, containing a lot of classes that collide with Android.
|
||||
- tagsoup: `TagSoup` is a library for parsing HTML.
|
||||
Developers do not know that it is part of Android as hidden classes.
|
||||
- json: there is only one hidden class in `org.json`, redefined by #num(821) applications: `JSONObject$1`.
|
||||
|
@ -265,32 +263,32 @@ All these hidden shadow classes are libraries included by the developers who pro
|
|||
// ...
|
||||
}
|
||||
```,
|
||||
caption: [Implementation of Reflection executed by #ART (shadowed by @lst:cl-refl2],
|
||||
caption: [Implementation of Reflection executed by #ART (shadowed by @lst:cl-refl2)],
|
||||
) <lst:cl-refl1>
|
||||
|
||||
The last column of @tab:cl-shadow shows the proportion of applications considered as malware because we arbitrarily fixed a threshold of 3 positive detections from VirusTotal reports.
|
||||
For the whole dataset, we have 0.53% of applications considered as malware.
|
||||
We can see that an application that uses self-shadowing is 10 times more likely to be a malware, when the proportion of malware among application shadowing #platc is the same as in the rest of the dataset.
|
||||
Thus, we manually reversed self-shadowing malware, and found that the self-shadowing does not look to be voluntary.
|
||||
We can see that an application that uses self-shadowing is 10 times more likely to be malware, when the proportion of malware among application shadowing #platc is the same as in the rest of the dataset.
|
||||
Thus, we manually reversed self-shadowing malware and found that the self-shadowing does not look to be voluntary.
|
||||
The colliding classes are often the same implementation, occasionally with minor differences, like different versions of a library.
|
||||
Additionally, we noticed multiple times internal classes from `com.google.android.gms.ads` colliding with each other, but we believe that it is due to bad processing during the compilation of the application.
|
||||
|
||||
// Nom de l'app: ShareCRM, mais ca a l'air d'exister sur le store donc on va eviter un process et pas la nommer
|
||||
// https://play.google.com/store/apps/details?id=com.facishare.fsplay&hl=en
|
||||
|
||||
The most notable case we found was an application that still exists on the Google Play Store with the same package name#footnote[SHA256: `C46A65EA1A797119CCC03C579B61C94FE8161308A3B6A8F55718D6ADAD112546`]. This application contains a self-shadow class `me.weishu.reflection.Reflection` that can be found in github, in the repository `tiann/FreeReflection`#footnote[https://github.com/tiann/FreeReflection]. This class is used to disable Android restrictions on hidden #API.
|
||||
The most notable case we found was an application that still exists on the Google Play Store with the same package name#footnote[SHA256: `C46A65EA1A797119CCC03C579B61C94FE8161308A3B6A8F55718D6ADAD112546`]. This application contains a self-shadow class `me.weishu.reflection.Reflection` that can be found in Github, in the repository `tiann/FreeReflection`#footnote[https://github.com/tiann/FreeReflection]. This class is used to disable Android restrictions on hidden #API.
|
||||
At first glance, we believed the shadowing to be done voluntarily for obfuscation purposes.
|
||||
The shadow class that would be seen by a reverser is given in @lst:cl-refl2: it contains some Java bytecode performing reflection and loading a native library named "free-reflection" (the associated `.so` is missing).
|
||||
The shadowed class that is really executed is summarized in @lst:cl-refl1.
|
||||
It contains a more obfuscated code: a `DEX` field storing base64 encoded #DEX bytecode that is later used to load some new code.
|
||||
When looking at this new code stored in the field, we found that it does almost the same thing as the code in the shadow class.
|
||||
Thus, we believe that the developer has upgraded their obfuscation techniques, replacing a native library by inline base64 encoded bytecode.
|
||||
The shadow attack could be unintentional, but it strengthens the masking of the new implementation.
|
||||
The shadow class that would be seen by a reverser is given in @lst:cl-refl2: it contains some Java bytecode performing reflection and loading a native library named "free-reflection" (the associated `.so` is missing).
|
||||
The shadowed class that is really executed is summarised in @lst:cl-refl1.
|
||||
It contains a more obfuscated code: a `DEX` field storing base64 encoded #DEX bytecode that is later used to load some new code.
|
||||
When looking at this new code stored in the field, we found that it does almost the same thing as the code in the shadow class.
|
||||
Thus, we believe that the developer has upgraded their obfuscation techniques, replacing a native library with inline base64 encoded bytecode.
|
||||
The shadow attack could be unintentional, but it strengthens the masking of the new implementation.
|
||||
|
||||
#v(2em)
|
||||
|
||||
As a conclusion, we observed that:
|
||||
- #SDK shadowing is performed by #shadowsdk of applications but are unintentional: these classes are embedded for retro-compatibility purpose or because the developer added a library already present in Android;
|
||||
- Hidden shadowing rarely occurs and is mainly due to the usage of libraries that Android already contains;
|
||||
- Malware perform more self-shadowing than goodware applications, and we found a sample where self-shadowing would clearly mislead the reverser.
|
||||
- #SDK shadowing is performed by #shadowsdk of applications, but is unintentional: these classes are embedded for retro-compatibility purposes or because the developer added a library already present in Android.
|
||||
- Hidden shadowing rarely occurs and is mainly due to the usage of libraries that Android already contains.
|
||||
- Malware performs more self-shadowing than goodware applications, and we found a sample where self-shadowing would clearly mislead the reverser.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue