wip

2025-10-01 15:51:12 +02:00 · 2025-10-01 15:51:12 +02:00 · b5583dbae9
commit b5583dbae9
parent 346151125e
8 changed files with 110 additions and 41 deletions
--- a/1_introduction/main.typ
+++ b/1_introduction/main.typ
@ -121,7 +121,7 @@ The contributions of this thesis are the following:
  Based on this model, we define a class of obfuscation techniques that we call _shadow attacks_ where a class definition in an #APK shadows the actual class definition.
  We show that common state-of-the-art tools like Jadx or Flowdroid do not implement this model correctly and thus can fall for those shadow attacks.
  We analysed a large number of recent Android applications and found that applications with class shadowing do exist, though they are the result of quirks in the #APK compilation process and not deliberate obfuscation attempts.
-  This work was published in the Digital Threats journal~@classloaderinthemiddle. #todo[update ref when not 'just published' anymore]
+  This work was published in the Digital Threats journal~@classloaderinthemiddle.
 + We propose an approach to allow static analysis tools to analyse applications that perform dynamic code loading:
  We collect at runtime the bytecode dynamically loaded and the reflection calls information, and patch the #APK file to perform those operations statically.
  Finally, we evaluate the impact this transformation has on the tools we containerised previously.#jfl-note[Dire 2 mots sur la méthode de patch qui a été reimplémentée pour être robuste? \ jm: j'ai pas eu le temps de comparer avec soot/droidRA, je trouve que sans xp ca fait trop trust me bro #emoji.cat.face.cry]
--- a/2_background/2_3_static_analysis.typ
+++ b/2_background/2_3_static_analysis.typ
@ -114,6 +114,7 @@ This time, instead of methods, the nodes represent instructions, and the edges i
    caption: [c) Corresponding Control-Flow Graph]
  ) <fig:bg-fizzbuzz-cfg>]))
  h(1em)},
+  kind: image,
  supplement: [Figure],
  caption: [Source code for a simple Java method and its Call and Control Flow Graphs],
 )<fig:bg-fizzbuzz-cg-cfg>
--- a/3_rasta/3_experiments.typ
+++ b/3_rasta/3_experiments.typ
@ -121,8 +121,9 @@ For the tools that we could run, #resultratio of analyses are finishing successf
      width: 50%,
      alt: ""
    ),
-    caption: [Java-based tools],
-    supplement: [Subfigure],
+    caption: [a) Java-based tools],
+    supplement: none,
+    kind: "sub-rasta-exit-evolution"
  ) <fig:rasta-exit-evolution-java>],
  [#figure(
    image(
@ -130,17 +131,18 @@ For the tools that we could run, #resultratio of analyses are finishing successf
      width: 50%,
      alt: "",
    ),
-    caption: [Non-Java-based tools],
-    supplement: [Subfigure],
+    caption: [b) Non-Java-based tools],
+    supplement: none,
+    kind: "sub-rasta-exit-evolution"
  ) <fig:rasta-exit-evolution-not-java>]
  ), caption: [Exit status evolution for the Rasta dataset]
-)
+) <fig:rasta-exit-evolution>

 For investigating the effect of application dates on the tools, we computed the date of each #APK based on the minimum date between the first upload in AndroZoo and the first analysis in VirusTotal. 
 Such a computation is more reliable than using the #DEX date, which is often obfuscated when packaging the application. 
 Then, for the sake of clarity of our results, we separated the tools that have mainly Java source code from those that use other languages. 
 Among the ones that are Java-based programs, most of them use the Soot framework, which may correlate the obtained results. 
-@fig:rasta-exit-evolution-java (resp. @fig:rasta-exit-evolution-not-java) compares the success rate of the tools between 2010 and 2023 for Java-based tools (resp. non Java-based tools).
+@fig:rasta-exit-evolution a) (resp. @fig:rasta-exit-evolution b)) compares the success rate of the tools between 2010 and 2023 for Java-based tools (resp. non Java-based tools).
 For Java-based tools, a clear decrease in finishing rate can be observed globally for all tools. 
 For non-Java-based tools, 2 of them keep a high success rate (Androguard, Mallodroid). 
 The result is expected for Androguard, because the analysis is relatively simple and the tool is largely adopted, as previously mentioned. 
@ -186,8 +188,9 @@ To compare the influence of the date, #SDK version and size of applications, we
      width: 50%,
      alt: ""
    ),
-    caption: [Java-based tools],
-    supplement: [Subfigure],
+    caption: [a) Java-based tools],
+    kind: "sub-rasta-decorelation-size-2022",
+    supplement: none,
  ) <fig:rasta-rate-evolution-java-2022>],
  [#figure(
    image(
@ -195,15 +198,16 @@ To compare the influence of the date, #SDK version and size of applications, we
      width: 50%,
      alt: "",
    ),
-    caption: [Non-Java-based tools],
-    supplement: [Subfigure],
+    caption: [b) Non-Java-based tools],
+    kind: "sub-rasta-decorelation-size-2022",
+    supplement: none,
  ) <fig:rasta-rate-evolution-non-java-2022>]
  ), caption: [Finishing rate by bytecode size for #APK detected in 2022]
-) <fig:rasta-decorelation-size>
+) <fig:rasta-decorelation-size-2022>

 #paragraph[Fixed application year. (#num(5000) #APKs)][
 We selected the year 2022, which has a good amount of representatives for each decile of size in our application dataset.
-@fig:rasta-rate-evolution-java-2022 (resp. @fig:rasta-rate-evolution-non-java-2022) shows the finishing rate of the tools in function of the size of the bytecode for Java-based tools (resp. non-Java-based tools) analysing applications of 2022. 
+@fig:rasta-decorelation-size-2022 a) (resp. @fig:rasta-decorelation-size-2022 b)) shows the finishing rate of the tools in function of the size of the bytecode for Java-based tools (resp. non-Java-based tools) analysing applications of 2022. 
 We can observe that all Java-based tools have a finishing rate that decreases over the years. 
 50% of non-Java-based tools have the same behaviour.
 ]
@ -216,8 +220,9 @@ We can observe that all Java-based tools have a finishing rate that decreases ov
      width: 50%,
      alt: ""
    ),
-    caption: [Java-based tools],
-    supplement: [Subfigure],
+    caption: [a) Java-based tools],
+    supplement: none,
+    kind: "sub-rasta-decorelation-size",
  ) <fig:rasta-rate-evolution-java-decile-year>],
  [#figure(
    image(
@ -225,15 +230,16 @@ We can observe that all Java-based tools have a finishing rate that decreases ov
      width: 50%,
      alt: "",
    ),
-    caption: [Non-Java-based tools],
-    supplement: [Subfigure],
+    caption: [b) Non-Java-based tools],
+    supplement: none,
+    kind: "sub-rasta-decorelation-size",
  ) <fig:rasta-rate-evolution-non-java-decile-year>]
  ), caption: [Finishing rate by discovery year with a bytecode size $in$  [4.08, 5.2] MB]
-) <fig:rasta-decorelation-size>
+) <fig:rasta-decorelation-size-decide-year>

 #paragraph[Fixed application bytecode size. (#num(6252) APKs)][
 We selected the sixth decile (between 4.08 and 5.20 MB), which is well represented in a wide number of years.
-@fig:rasta-rate-evolution-java-decile-year (resp. @fig:rasta-rate-evolution-non-java-decile-year) represents the finishing rate depending on the year at a fixed bytecode size. 
+@fig:rasta-decorelation-size-decide-year a) (resp. @fig:rasta-decorelation-size-decide-year b)) represents the finishing rate depending on the year at a fixed bytecode size. 
 We observe that 9 tools out of 12 have a finishing rate dropping below 20% for Java-based tools, which is not the case for non-Java-based tools.
 ]

@ -245,8 +251,9 @@ We observe that 9 tools out of 12 have a finishing rate dropping below 20% for J
      width: 50%,
      alt: ""
    ),
-    caption: [Java-based tools],
-    supplement: [Subfigure],
+    caption: [a) Java-based tools],
+    kind: "sub-rasta-decorelation-size-decile-min-sdk",
+    supplement: none,
  ) <fig:rasta-rate-evolution-java-decile-min-sdk>],
  [#figure(
    image(
@ -254,13 +261,14 @@ We observe that 9 tools out of 12 have a finishing rate dropping below 20% for J
      width: 50%,
      alt: "",
    ),
-    caption: [Non-Java-based tools],
-    supplement: [Subfigure],
+    caption: [b) Non-Java-based tools],
+    kind: "sub-rasta-decorelation-size-decile-min-sdk",
+    supplement: none,
  ) <fig:rasta-rate-evolution-non-java-decile-min-sdk>]
  ), caption: [Finishing rate by min #SDK with a bytecode size $in$ [4.08, 5.2] MB]
-) <fig:rasta-decorelation-size>
+) <fig:rasta-decorelation-size-decile-min-sdk>

-We performed similar experiments by varying the min #SDK and target #SDK versions, still with a fixed bytecode size between 4.08 and 5.2 MB, as shown in @fig:rasta-rate-evolution-java-decile-min-sdk and @fig:rasta-rate-evolution-non-java-decile-min-sdk.
+We performed similar experiments by varying the min #SDK and target #SDK versions, still with a fixed bytecode size between 4.08 and 5.2 MB, as shown in @fig:rasta-decorelation-size-decile-min-sdk a) and @fig:rasta-decorelation-size-decile-min-sdk b).
 We found that, contrary to the target #SDK, the min #SDK version has an impact on the finishing rate of Java-based tools: 8 tools over 12 are below 50% after #SDK 16. 
 It is not surprising, as the min #SDK is highly correlated to the year.

--- a/4_class_loader/3_obfuscation.typ
+++ b/4_class_loader/3_obfuscation.typ
@ -179,7 +179,7 @@ The documentation highlights the analysis commands that compute three types of o
 The #APK and the list of #dexfiles are a one-to-one representation of the content of an application, and have the same issues that we discussed with Apktool: they provide the different versions of a shadow class contained in multiple #dexfiles.

 The Analysis object is used to compute a method call graph, and we found that this algorithm may choose the wrong version of a shadowed class when using the cross-references that are computed. 
-This leads to an invalid call graph, as shown in @fig:cl-andro_obf_cg: the two methods `doSomething()` are represented in the graph, but the one linked to `main()` on the graph is the one calling the method `good()` when in fact the method `bad()` is called when running the application.
+This leads to an invalid call graph, as shown in @fig:cl-androguard_call_graph b): the two methods `doSomething()` are represented in the graph, but the one linked to `main()` on the graph is the one calling the method `good()` when in fact the method `bad()` is called when running the application.

 Androguard has a method `.is_external()` to detect if the implementation of a class is not provided inside the application and a method `.is_android_api()` to detect if the class is part of the Android #API. 
 Regrettably, the documentation of `.is_android_api()` explains that the method is still experimental and just checks a few package names. 
@ -203,8 +203,9 @@ Because of that, like for Apktool and Jadx, Androguard has no way to warn the re

      "
    ),
-    supplement: [Subfigure],
-    caption: [Expected Call Graph]
+    kind: "sub-cl-androguard_call_graph",
+    supplement: none,
+    caption: [a) Expected Call Graph]
  ) <fig:cl-andro_non_obf_cg>],[
  #figure(
    image(
@ -219,8 +220,9 @@ Because of that, like for Apktool and Jadx, Androguard has no way to warn the re
      There are two boxes Obfuscation.doSomething(), the one pointed by Main.main() and that points to Main.good() is gray, the one without arrows pointed at and that points to bad is white like the other boxes.
      "
    ),
-    supplement: [Subfigure],
-    caption: [Call Graph Computed by Androguard]
+    kind: "sub-cl-androguard_call_graph",
+    supplement: none,
+    caption: [b) Call Graph Computed by Androguard]
  ) <fig:cl-andro_obf_cg>
  ]) 
  h(1em)},
--- a/X_appendices/french_summary.typ
+++ b/X_appendices/french_summary.typ
--- a/X_appendices/released_software.typ
+++ b/X_appendices/released_software.typ
@ -0,0 +1,40 @@
+#import "../lib.typ": etal
+
+= Released Software
+
+In @sec:rasta, we mentioned that we had some difficulties finding some software listed by Li #etal following the disappearance of the original websites hosting it.
+To limit the risk of having the same issue, we hosted the different pieces of software we released for this thesis in several locations.
+This appendix lists the software we released as well as the different places they can be found.
+
+== RASTA
+
+The code used in @sec:rasta is available at those locations:
+
+- The author's personal git: https://git.mineau.eu/these-android-re/rasta
+- The research team Gitlab: https://gitlab.inria.fr/pirat/android/rasta
+- Github: https://github.com/histausse/rasta
+- Zenodo: https://doi.org/10.5281/zenodo.10137904
+
+The exact version of the code used in @sec:rasta is tagged as `icsr2024` in the git repositories and corresponds to the one stored in Zenodo.
+
+The container images used to run the different tools are available on Zenodo at https://doi.org/10.5281/zenodo.10980349 as Singularity images, and on Dockerhub under the names:
+
+- #link("https://hub.docker.com/r/histausse/rasta-adagio")[`histausse/rasta-adagio:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-amandroid")[`histausse/rasta-amandroid:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-anadroid")[`histausse/rasta-anadroid:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-androguard-dad")[`histausse/rasta-androguard-dad:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-androguard")[`histausse/rasta-androguard:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-apparecium")[`histausse/rasta-apparecium:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-blueseal")[`histausse/rasta-blueseal:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-dialdroid")[`histausse/rasta-dialdroid:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-didfail")[`histausse/rasta-didfail:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-droidsafe")[`histausse/rasta-droidsafe:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-flowdroid")[`histausse/rasta-flowdroid:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-gator")[`histausse/rasta-gator:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-ic3-fork")[`histausse/rasta-ic3-fork:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-ic3")[`histausse/rasta-ic3:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-iccta")[`histausse/rasta-iccta:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-mallodroid")[`histausse/rasta-mallodroid:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-redexer")[`histausse/rasta-redexer:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-saaf")[`histausse/rasta-saaf:icsr2024`]
+- #link("https://hub.docker.com/r/histausse/rasta-wognsen")[`histausse/rasta-wognsen:icsr2024`]
--- a/bibliography.bib
+++ b/bibliography.bib
@ -16,16 +16,20 @@

@article{classloaderinthemiddle,
 	author = {Mineau, Jean-Marie and Lalande, Jean-Fran\c{c}ois},
-	title = {Class loaders in the middle: confusing Android static analyzers},
+	title = {Class Loaders in the Middle: Confusing Android Static Analyzers},
 	year = {2025},
+	issue_date = {September 2025},
 	publisher = {Association for Computing Machinery},
 	address = {New York, NY, USA},
+	volume = {6},
+	number = {3},
 	url = {https://doi.org/10.1145/3754457},
 	doi = {10.1145/3754457},
-	abstract = {When executing a mobile application, Android executes either the classes provided by the developer or the ones provided by the operating system. The dynamic linking and loading of the different classes is a complex task that may be exploited by an attacker. In particular, if the developer adds a class whose name collides with another class of Android, they may confuse a reverse engineer. In this paper, we explore the possible collisions that can occur between classes defined multiple times at different locations, i.e., multiple times in the APK file or, at the same time, in the APK and the operating system. We highlight three attacks that we call shadow attacks. In particular, we show that static analysis tools used by a reverse engineer choose the shadow implementation for most of the evaluated tools, and output a wrong result. In particular, the flow analysis of Androguard or Flowdroid can be fooled by an attacker. In a dataset of 49 975 applications, we also explored if shadow attacks are used in the wild and found that most of the time, there is no malicious behavior behind them. The main results are that 23.52 \% of applications shadow a class of the SDK and 3.11 \% a hidden class of the system.},
-	note = {Just Accepted},
+	abstract = {When executing a mobile application, Android executes either the classes provided by the developer or the ones provided by the operating system. The dynamic linking and loading of the different classes is a complex task that may be exploited by an attacker. In particular, if the developer adds a class whose name collides with another class of Android, they may confuse a reverse engineer. In this article, we explore the possible collisions that can occur between classes defined multiple times at different locations, i.e., multiple times in the APK file or, at the same time, in the APK and the operating system. We highlight three attacks that we call shadow attacks. In particular, we show that static analysis tools used by a reverse engineer choose the shadow implementation for most of the evaluated tools, and output a wrong result. In particular, the flow analysis of Androguard or Flowdroid can be fooled by an attacker. In a dataset of 49,975 applications, we also explored if shadow attacks are used in the wild and found that most of the time, there is no malicious behavior behind them. The main results are that 23.52\% of applications shadow a class of the SDK and 3.11\% a hidden class of the system.},
 	journal = {Digital Threats},
-	month = jul,
+	month = sep,
+	articleno = {19},
+	numpages = {19},
 	keywords = {Android, static analysis, class loading, code obfuscation}
 }

--- a/main.typ
+++ b/main.typ
@ -88,14 +88,14 @@

 // Preamble
 #{
-  set heading(numbering: none, outlined: false)
+  set heading(numbering: none, outlined: false, bookmarked: true)
  set figure(outlined: false)
  set page(numbering: "i")
  counter(page).update(0)

  include("0_preamble/acknowledgements.typ")

-  outline(title: "Table of Contents", indent: auto)
+  outline(title: "Table of Contents", indent: auto, depth: 2)
  show outline.entry: it => {
    v(5mm, weak: true)
    it
@ -131,13 +131,26 @@
 #include("5_theseus/main.typ")
 #include("6_conclusion/main.typ")

-#bibliography("bibliography.bib")

 #{
-  set heading(numbering: none, outlined: false)
+  set heading(numbering: none, outlined: true, bookmarked: true)
  set figure(outlined: false)
-  set page(numbering: "i")
-  counter(page).update(0)
+  //set page(numbering: "i")
+  //counter(page).update(0)
+
+  pagebreak(to: "odd")
+  {
+    set page("a4",
+      margin: (outside: 20mm, inside: 30mm, top: 50mm, bottom: 50mm),
+      numbering: none,
+      header: none,
+    )
+    align(center+horizon, smallcaps(text(size: 20pt)[Appendices]))
+    pagebreak()
+  }
+
+  include("X_appendices/released_software.typ")
+

  // https://ed-matisse.doctorat-bretagne.fr/fr/soutenance-de-these#p-151
  // > Le manuscrit est normalement rédigé en français (Loi relative à l'emploi de la langue française, 1994). 
@ -150,5 +163,6 @@
  // > l'administration de l'établissement d'inscription (par exemple en l'intitulant résumé en français et 
  // > en ne lui affectant aucun numéro de chapitre).
  //
-  include("0_preamble/french_summary.typ")
+  include("X_appendices/french_summary.typ")
+  bibliography("bibliography.bib")
 }