pass chap 4

2025-09-29 03:10:59 +02:00 · 2025-09-29 03:10:59 +02:00 · f23390279c
commit f23390279c
parent c9752714db
7 changed files with 177 additions and 182 deletions
--- a/4_class_loader/4_in_the_wild.typ
+++ b/4_class_loader/4_in_the_wild.typ
@ -3,16 +3,16 @@

 == Shadow Attacks in the Wild <sec:cl-wild>

-In this section, we evaluate in the wild if applications that can be found in the Play store or other markets use one of the shadow techniques. 
+In this section, we evaluate in the wild if applications that can be found in the Play Store or other markets use one of the shadow techniques. 
 Our goal is to explore the usage of shadow techniques in real applications. 
-Because we modeled the behavior of a rescent version of Android (#SDK 34), we decided to not used our dataset from @sec:rasta.
-The applications in the RASTA dataset span over more than 10 years and we cannot garanties that sandow attacks behaved the same during those 10 years.
-At the verry least, self-shadowing would not be possible before the introduction of multi-dex in 2014 -- about a fourth of the applications in the RASTA dataset.
-Instead, sampled another dataset of recent applications. 
-We want to include malicious applications (in case such techniques would be used to hide malicious code) so we selected #num(50000) applications randomly from AndroZoo~@allixAndroZooCollectingMillions2016 that appeared in 2023. 
-Malicious applications are spot in our dataset by using a threshold of 3 over the number of VirusTotal engines reporting an application as a malware.
-This number is provided by Androzoo, for scans performed between january 2023 and january 2024 depending on the application.
-A few applications over the total could not be retrieved or parsed leading to a final dataset of #nbapk applications. 
+Because we modelled the behaviour of a recent version of Android (#SDK 34), we decided not to use our dataset from @sec:rasta.
+The applications in the RASTA dataset span over more than 10 years, and we cannot guarantee that sandow attacks behaved the same during those 10 years.
+At the very least, self-shadowing would not be possible before the introduction of multi-dex in 2014 -- about a fourth of the applications in the RASTA dataset.
+Instead, we sampled another dataset of recent applications. 
+This way, we can also include malicious applications (in case such techniques would be used to hide malicious code), so we selected #num(50000) applications randomly from AndroZoo~@allixAndroZooCollectingMillions2016 that appeared in 2023. 
+Malicious applications are spotted in our dataset by using a threshold of 3 over the number of VirusTotal engines reporting an application as malware.
+This number is provided by Androzoo for scans performed between January 2023 and January 2024, depending on the application.
+A few applications over the total could not be retrieved or parsed, leading to a final dataset of #nbapk applications. 
 We automatically disassembled the applications to obtain the list of included classes. 
 Then, we check if any shadow attack occurs in the #APK itself or with #platc of #SDK 34. 

@ -24,7 +24,6 @@ on prend les classes des platform classes et
 comparé à SDK 32 33 34: si la shadow class match, alors match 
 */

-
 #todo[cl-shadow]
 #figure({
  show table: set text(size: 0.80em)
@ -49,8 +48,7 @@ comparé à SDK 32 33 34: si la shadow class match, alors match

      [],
      [], [*%*], [*% malware*],
-      [*Shadow classes*], [*Median*], [*Target SDK*], [*Min SDK*],
-
+      [*Shadow classes*], [*Median*], [*Target #SDK*], [*Min #SDK*],
    ),
    table.cell(colspan: 9, inset: 3pt)[],
    table.hline(),
@ -87,23 +85,23 @@ comparé à SDK 32 33 34: si la shadow class match, alors match
 //The metadata provided by AndroZoo helps to have the flags reported by antiviruses used by VirusTotal#footnote[https://www.virustotal.com]. 


-We report in the upper part of @tab:cl-shadow the statistics about the whole dataset and the three shadow attacks: "self" when a class shadows another one in the #APK, "#SDK" when a class of the #SDK shadows  one of the #APK, and "Hidden" when a hidden class of Android shadows one of the #APK. 
+We report in the upper part of @tab:cl-shadow the statistics about the whole dataset and the three shadow attacks: "self" when a class shadows another one in the #APK, "#SDK" when a class of the #SDK shadows one of the #APK, and "Hidden" when a hidden class of Android shadows one of the #APK. 
 We observe that, on average, a few classes are shadowed by another class. 
-Note that the median value is 0 meaning that few apps shadow a lot of classes, but the majority of apps do not shadow anything. 
+Note that the median value is 0, meaning that few apps shadow a lot of classes, but the majority of apps do not shadow anything. 
 The number of applications shadowing a hidden #API is low, which is an expected result as these classes should not be known by the developer. 
 We observe a consequent number of applications, 23.52%, of applications that perform #SDK shadowing.
 It can be explained by the fact that some classes that newly appear are embedded in the #APK for end users that have old versions of Android: it is suggested by the average value of Min #SDK which is 21.7 for the whole dataset: on average, an application can be run inside a smartphone with #API 21, which would require to embed all new classes from 22 to 34. 
 This hypothesis about missing classes is further investigated later in this section.  

-In the bottom part of @tab:cl-shadow, we give the same statistics but we excluded applications that do not perform any shadowing. 
+In the bottom part of @tab:cl-shadow, we give the same statistics, but we excluded applications that do not perform any shadowing. 
 For those pairs of shadow classes, we disassembled them using Apktool to perform a comparison using instructions represented in the Smali language. 
 For self-shadow, we compare the pair.
 For the shadowing of the #SDK or Hidden class, we compare the code found in the #APK with implementations found in the emulator and `android.jar` of #SDK 32, 33, and 34.

 #paragraph([Self-shadowing])[
 We observe a low number of applications doing self-shadow attacks. 
-For each class that is shadowed, we compared its bytecode with the shadowed one.
-We observe that 74.8% are identical which suggests that the compilation process embeds the same class multiple times but makes variations in headers or metadata values. 
+For each class that is shadowed, we compared its bytecode with the shadowed one (we compared the Smali instructions generated by Apktool for each method).
+We observe that 74.8% are identical, which suggests that the compilation process embeds the same class multiple times but makes variations in headers or metadata values. 
 We investigate later in @sec:cl-malware the case of malicious applications.
 ]

@ -122,13 +120,13 @@ We investigate later in @sec:cl-malware the case of malicious applications.
    The remaining bars are between 0 and 5,000.
    "
  ),
-  caption: [Redefined #SDK classes, sorted by the first #SDK they appeared in.]
+  caption: [Redefined #SDK classes, sorted by the first #SDK they appeared in]
 )<fig:cl-classes_by_first_sdk>

 #paragraph([#SDK shadowing])[
 For the shadowing of #SDK classes, we observe a low ratio of identical classes. 
-This result could lead to the wrong conclusion that developers embed malicious versions of the #SDK classes, but our manual investigation shows that the difference is slight and probably due to compiler optimization. 
-To go further in the investigation, in @fig:cl-classes_by_first_sdk we represent these redefined classes with the following rules:
+This result could lead to the wrong conclusion that developers embed malicious versions of the #SDK classes, but our manual investigation shows that the difference is slight and probably due to compiler optimisation. 
+To go further in the investigation, in @fig:cl-classes_by_first_sdk, we represent these redefined classes with the following rules:

 - The class is classified on the X abscissa in the figure according to the #SDK it first appeared in.
 - The class is counted as "green" (solid) if it first appeared in the #SDK *after* the #APK min #SDK (retro compatibility purpose).
@ -138,19 +136,19 @@ We observe that the majority of classes are legitimate retro-compatibility addit
 Abnormal cases are observed for classes that appeared in #API versions 7 and before, 8, and 16. 
@tab:cl-topsdk reports the top ten classes that shadow the #SDK for the three mentioned versions. 
 For #SDK before 7, it mainly concerns HTTP classes: for example, the class `HttpParams` is an interface, containing limited bytecode that mostly matches the class already present on the emulator (98.03% of shadowed classes are identical). 
-`HttpConnectionParams` on the other hand differs from the platform class and we observe only 4.99% of identical classes. 
+`HttpConnectionParams`, on the other hand, differs from the platform class, and we observe only 4.99% of identical classes. 
 Manual inspection of some applications revealed that the two main reasons are:


- instead of checking if the methods attributes are null inline like Android does, applications use the method `org.apache.http.util.Args.notNull()`. According to comments in the source code of Android#footnote[https://cs.android.com/android/platform/superproject/main/+/main:frameworks/base/core/java/org/apache/http/params/HttpConnectionParams.java;drc=3bdd327f8532a79b83f575cc62e8eb09a1f93f3d?], the class was forked in 2007 from Apache 'httpcomponents' project. Looking at the history of the project, the use of `Args.notNull()` was introduced in 2012#footnote[https://github.com/apache/httpcomponents-core/commit/9104a92ea79e338d876b1b60f5cd2b243ba7069f?]. This shows that applications are embedding code from more recent version of this library without realizing their version will not be the used one.
- very small changes that we found can be attributed to the compilation process (e.g. swapping registers: `v0` is used instead of `v1` and `v1` instead of `v0`), but even if we consider them different, they are very similar.
+- Instead of checking if the method's attributes are null inline, like Android does, applications use the method `org.apache.http.util.Args.notNull()`. According to comments in the source code of Android#footnote[https://cs.android.com/android/platform/superproject/main/+/main:frameworks/base/core/java/org/apache/http/params/HttpConnectionParams.java;drc=3bdd327f8532a79b83f575cc62e8eb09a1f93f3d?], the class was forked in 2007 from the Apache 'httpcomponents' project. Looking at the history of the project, the use of `Args.notNull()` was introduced in 2012#footnote[https://github.com/apache/httpcomponents-core/commit/9104a92ea79e338d876b1b60f5cd2b243ba7069f?]. This shows that applications are embedding code from more recent versions of this library without realising their version will not be the one used.
+- Very small changes that we found can be attributed to the compilation process (e.g. swapping registers: `v0` is used instead of `v1` and `v1` instead of `v0`), but even if we consider them different, they are very similar.

 The remaining 4.99% of classes that are identical to the Android version are classes where the body of the methods is replaced by stubs that throw `RuntimeException("Stub!")`.
-This code corresponds to what we found in `android.jar` but not the code we found in the emulator, which is surprising. 
-Nevertheless, we decided to count them as identical, because `android.jar` is the official jar file for developer, and stubs are replaced in the emulator: it is intended by Google developers.
+This code corresponds to what we found in `android.jar`, but not the code we found in the emulator, which is surprising. 
+Nevertheless, we decided to count them as identical, because `android.jar` is the official jar file for developers, and stubs are replaced in the emulator: it is intended by Google developers.

 Other results of @tab:cl-topsdk can be similarly discussed: either they are identical with a high ratio, or they are different because of small variations. 
-When substantial differences appear it is mainly because different versions of the same library have been used or an #SDK class is embedded for retro-compatibility.
+When substantial differences appear, it is mainly because different versions of the same library have been used or an #SDK class is embedded for retro-compatibility.
 ]

 #figure({
@ -168,7 +166,7 @@ When substantial differences appear it is mainly because different versions of t
    table.cell(colspan: 3, inset: 2pt)[],
    table.hline(),
    table.cell(colspan: 3, inset: 2pt)[],
-    [redefined  for SDK $<=$ 7], [], [],
+    [redefined  for #SDK $<=$ 7], [], [],
    table.hline(),
    table.cell(colspan: 3, inset: 2pt)[],
    ..redef_sdk_7minus.map(e => (
@ -178,7 +176,7 @@ When substantial differences appear it is mainly because different versions of t
    table.cell(colspan: 3, inset: 2pt)[],
    table.hline(),
    table.cell(colspan: 3, inset: 2pt)[],
-    [redefined  for SDK $=$ 8], [], [],
+    [redefined  for #SDK $=$ 8], [], [],
    table.cell(colspan: 3, inset: 2pt)[],
    table.hline(),
    table.cell(colspan: 3, inset: 2pt)[],
@ -189,7 +187,7 @@ When substantial differences appear it is mainly because different versions of t
    table.cell(colspan: 3, inset: 2pt)[],
    table.hline(),
    table.cell(colspan: 3, inset: 2pt)[],
-    [redefined  for SDK $=$ 16], [], [],
+    [redefined  for #SDK $=$ 16], [], [],
    table.cell(colspan: 3, inset: 2pt)[],
    table.hline(),
    table.cell(colspan: 3, inset: 2pt)[],
@ -209,7 +207,7 @@ For applications redefining hidden classes, on average, 16.1 classes are redefin
 The top 3 packages whose code actually differs from the ones found in Android are `java.util.stream`, `org.ccil.cowan.tagsoup` and `org.json`:

 - stream: when looking in more detail, we found that `java.util.stream` was only redefined by 6 applications, but the large number of classes redefined artificially puts the package at the top of the list.
-  It is explained by the fact that developers have included this library containing a lot of classes colliding with Android.
+  It is explained by the fact that developers have included this library, containing a lot of classes that collide with Android.
 - tagsoup: `TagSoup` is a library for parsing HTML.
  Developers do not know that it is part of Android as hidden classes.
 - json: there is only one hidden class in `org.json`, redefined by #num(821) applications: `JSONObject$1`.
@ -265,32 +263,32 @@ All these hidden shadow classes are libraries included by the developers who pro
    // ...
  }
  ```,
-  caption: [Implementation of Reflection executed by #ART (shadowed by @lst:cl-refl2],
+  caption: [Implementation of Reflection executed by #ART (shadowed by @lst:cl-refl2)],
 ) <lst:cl-refl1>

 The last column of @tab:cl-shadow shows the proportion of applications considered as malware because we arbitrarily fixed a threshold of 3 positive detections from VirusTotal reports. 
 For the whole dataset, we have 0.53% of applications considered as malware.  
-We can see that an application that uses self-shadowing is 10 times more likely to be a malware, when the proportion of malware among application shadowing #platc is the same as in the rest of the dataset. 
-Thus, we manually reversed self-shadowing malware, and found that the self-shadowing does not look to be voluntary.
+We can see that an application that uses self-shadowing is 10 times more likely to be malware, when the proportion of malware among application shadowing #platc is the same as in the rest of the dataset. 
+Thus, we manually reversed self-shadowing malware and found that the self-shadowing does not look to be voluntary.
 The colliding classes are often the same implementation, occasionally with minor differences, like different versions of a library. 
 Additionally, we noticed multiple times internal classes from `com.google.android.gms.ads` colliding with each other, but we believe that it is due to bad processing during the compilation of the application.

 // Nom de l'app: ShareCRM, mais ca a l'air d'exister sur le store donc on va eviter un process et pas la nommer
 // https://play.google.com/store/apps/details?id=com.facishare.fsplay&hl=en

-The most notable case we found was an application that still exists on the Google Play Store with the same package name#footnote[SHA256: `C46A65EA1A797119CCC03C579B61C94FE8161308A3B6A8F55718D6ADAD112546`]. This application contains a self-shadow class `me.weishu.reflection.Reflection` that can be found in github, in the repository `tiann/FreeReflection`#footnote[https://github.com/tiann/FreeReflection]. This class is used to disable Android restrictions on hidden #API.
+The most notable case we found was an application that still exists on the Google Play Store with the same package name#footnote[SHA256: `C46A65EA1A797119CCC03C579B61C94FE8161308A3B6A8F55718D6ADAD112546`]. This application contains a self-shadow class `me.weishu.reflection.Reflection` that can be found in Github, in the repository `tiann/FreeReflection`#footnote[https://github.com/tiann/FreeReflection]. This class is used to disable Android restrictions on hidden #API.
 At first glance, we believed the shadowing to be done voluntarily for obfuscation purposes.
- The shadow class that would be seen by a reverser is given in @lst:cl-refl2: it contains some Java bytecode performing reflection and loading a native library named "free-reflection" (the associated `.so` is missing). 
- The shadowed class that is really executed is summarized in @lst:cl-refl1. 
- It contains a more obfuscated code: a `DEX` field storing base64 encoded #DEX bytecode that is later used to load some new code. 
- When looking at this new code stored in the field, we found that it does almost the same thing as the code in the shadow class. 
- Thus, we believe that the developer has upgraded their obfuscation techniques, replacing a native library by inline base64 encoded bytecode. 
- The shadow attack could be unintentional, but it strengthens the masking of the new implementation.
+The shadow class that would be seen by a reverser is given in @lst:cl-refl2: it contains some Java bytecode performing reflection and loading a native library named "free-reflection" (the associated `.so` is missing). 
+The shadowed class that is really executed is summarised in @lst:cl-refl1. 
+It contains a more obfuscated code: a `DEX` field storing base64 encoded #DEX bytecode that is later used to load some new code. 
+When looking at this new code stored in the field, we found that it does almost the same thing as the code in the shadow class. 
+Thus, we believe that the developer has upgraded their obfuscation techniques, replacing a native library with inline base64 encoded bytecode. 
+The shadow attack could be unintentional, but it strengthens the masking of the new implementation.

 #v(2em)

 As a conclusion, we observed that:
- #SDK shadowing is performed by #shadowsdk of applications but are unintentional: these classes are embedded for retro-compatibility purpose or because the developer added a library already present in Android;
- Hidden shadowing rarely occurs and is mainly due to the usage of libraries that Android already contains;
- Malware perform more self-shadowing than goodware applications, and we found a sample where self-shadowing would clearly mislead the reverser.
+- #SDK shadowing is performed by #shadowsdk of applications, but is unintentional: these classes are embedded for retro-compatibility purposes or because the developer added a library already present in Android.
+- Hidden shadowing rarely occurs and is mainly due to the usage of libraries that Android already contains.
+- Malware performs more self-shadowing than goodware applications, and we found a sample where self-shadowing would clearly mislead the reverser.