I declare this manuscript finished

2025-10-07 17:16:32 +02:00 · 2025-10-07 17:16:32 +02:00 · 5c3a6955bd
commit 5c3a6955bd
parent 9f39ded209
14 changed files with 162 additions and 131 deletions
--- a/5_theseus/3_static_transformation.typ
+++ b/5_theseus/3_static_transformation.typ
@ -82,25 +82,6 @@ If we were to expect other possible methods to be called in addition to `myMetho
 ] #todo[Ref to list of common tools?] reformated for readability.
 */

-The check of the `Method` value is done in a separate method injected inside the application to avoid cluttering the application too much.
-Because Java (and thus Android) uses polymorphic methods, we cannot just check the method name and its class, but also the whole method signature.
-We chose to limit the transformation to the specific instruction that calls `Method.invoke(..)`.
-This drastically reduces the risks of breaking the application, but leads to a lot of type casting.
-Indeed, the reflection call uses the generic `Object` class, but actual methods usually use specific classes (#eg `String`, `Context`, `Reflectee`) or scalar types (#eg `int`, `long`, `boolean`).
-This means that the method parameters and object on which the method is called must be downcasted to their actual type before calling the method, then the returned value must be upcasted back to an `Object`.
-Scalar types especially require special attention. 
-Java (and Android) distinguish between scalar types and classes, and they cannot be mixed: a scalar cannot be cast into an `Object`.
-However, each scalar type has an associated class that can be used when doing reflection.
-For example, the scalar type `int` is associated with the class `Integer`, the method `Integer.valueOf()` can convert an `int` scalar to an `Integer` object, and the method `Integer.intValue()` converts back an `Integer` object to an `int` scalar.
-Each time the method called by reflection uses scalars, the scalar-object conversion must be made before calling it.
-And finally, because the instruction following the reflection call expects an `Object`, the return value of the method must be cast into an `Object`.
-
-This back and forth between types might confuse some analysis tools. 
-This could be improved in future works by analysing the code around the reflection call. 
-For example, if the result of the reflection call is immediately cast into the expected type (#eg in @lst:-th-expl-cl-call, the result is cast to a `String`), there should be no need to cast it to Object in between.
-Similarly, it is common to have the method parameter arrays generated just before the reflection call and never be used again (This is due to `Method.invoke(..)` being a varargs method: the array can be generated by the compiler at compile time).
-In those cases, the parameters could be used directly without the detour inside an array.
-
 #figure(
  ```java
  class T {
@ -137,6 +118,25 @@ In those cases, the parameters could be used directly without the detour inside
  caption: [@lst:-th-expl-cl-call after the de-reflection transformation]
 ) <lst:-th-expl-cl-call-trans>

+The check of the `Method` value is done in a separate method injected inside the application to avoid cluttering the application too much.
+Because Java (and thus Android) uses polymorphic methods, we cannot just check the method name and its class, but also the whole method signature.
+We chose to limit the transformation to the specific instruction that calls `Method.invoke(..)`.
+This drastically reduces the risks of breaking the application, but leads to a lot of type casting.
+Indeed, the reflection call uses the generic `Object` class, but actual methods usually use specific classes (#eg `String`, `Context`, `Reflectee`) or scalar types (#eg `int`, `long`, `boolean`).
+This means that the method parameters and object on which the method is called must be downcasted to their actual type before calling the method, then the returned value must be upcasted back to an `Object`.
+Scalar types especially require special attention. 
+Java (and Android) distinguish between scalar types and classes, and they cannot be mixed: a scalar cannot be cast into an `Object`.
+However, each scalar type has an associated class that can be used when doing reflection.
+For example, the scalar type `int` is associated with the class `Integer`, the method `Integer.valueOf()` can convert an `int` scalar to an `Integer` object, and the method `Integer.intValue()` converts back an `Integer` object to an `int` scalar.
+Each time the method called by reflection uses scalars, the scalar-object conversion must be made before calling it.
+And finally, because the instruction following the reflection call expects an `Object`, the return value of the method must be cast into an `Object`.
+
+This back and forth between types might confuse some analysis tools. 
+This could be improved in future works by analysing the code around the reflection call. 
+For example, if the result of the reflection call is immediately cast into the expected type (#eg in @lst:-th-expl-cl-call, the result is cast to a `String`), there should be no need to cast it to Object in between.
+Similarly, it is common to have the method parameter arrays generated just before the reflection call and never be used again (This is due to `Method.invoke(..)` being a varargs method: the array can be generated by the compiler at compile time).
+In those cases, the parameters could be used directly without the detour inside an array.
+
 === Transforming Code Loading (or Not) <sec:th-trans-cl>

 #jfl-note[Ici je pensais lire comment on tranforme le code qui load du code, mais on me parle de multi dex]
@ -270,7 +270,7 @@ We took special care to process the least possible files in the #APKs, and only
 Unfortunately, we did not have time to compare the robustness of our solution to existing tools like Apktool and Soot, but we did a quick performance comparison, summarised in @sec:th-lib-perf.
 In hindsight, we probably should have taken the time to find a way to use smali/backsmali (the backend of Apktool) as a library or use SootUp to do the instrumentation, but neither option has documentation to instrument applications this way.
 At the time of writing, the feature is still being developed, but in the future, Androguard might also become an option to modify #DEX files.
-Nevertheless, we published our instrumentation library, Androscalpel, for anyone who wants to use it. #todo[ref to code]
+Nevertheless, we published our instrumentation library, Androscalpel, for anyone who wants to use it (see @sec:soft). #todo[Update is CS says no]

 #midskip

--- a/5_theseus/5_results.typ
+++ b/5_theseus/5_results.typ
@ -111,7 +111,8 @@ The remaining #num(nb_bytecode_collected - nb_google - nb_appsflyer - nb_faceboo
    table.cell(colspan: 4)[...],
    table.hline(),
  ),
-  caption: [Most common dynamically loaded files]
+  caption: [Most common dynamically loaded files],
+  placement: top,
 ) <tab:th-bytecode-hashes>

 === Impact on Analysis Tools
@ -167,16 +168,6 @@ This is a reasonable failure rate, but we should keep in mind that it adds up to
 To check the impact on the finishing rate of our instrumentation, we then run the same experiment we ran in @sec:rasta.
 We run the tools on the #APK before and after instrumentation, and compared the finishing rates in @fig:th-status-npatched-vs-patched (without taking into account #APKs we failed to patch#footnote[Due to a handling error during the experiment, the figure shows the results for #nb_patched_rasta #APKs instead of #nb_patched. \ We also ignored the tool from Wognsen #etal due to the high number of timeouts]).

-The finishing rate comparison is shown in @fig:th-status-npatched-vs-patched. 
-We can see that in most cases, the finishing rate is either the same or slightly lower for the instrumented application.
-This is consistent with the fact that we add more bytecode to the application, hence adding more opportunities for failure during analysis.
-They are two notable exceptions: Saaf and IC3.
-The finishing rate of IC3, which was previously reasonable, dropped to 0 after our instrumentation, while the finishing rate of Saaf jumped to 100%, which is extremely suspicious.
-Analysing the logs of the analysis showed that both cases have the same origin: the bytecode generated by our instrumentation has a version number of 37 (the version introduced by Android 7.0).
-Unfortunately, neither the version of Apktool used by Saaf nor Dare (the tool used by IC3 to convert Dalvik bytecode to Java bytecode) recognises this version of bytecode, and thus failed to parse the #APK.
-In the case of Dare and IC3, our experiment correctly identifies this as a crash.
-On the other hand, Saaf do not detect the issue with Apktool and pursues the analysis with no bytecode to analyse and returns a valid return file, but for an empty application.
-
 #todo[alt text @fig:th-status-npatched-vs-patched]
 #figure({
  image(
@ -189,6 +180,16 @@ On the other hand, Saaf do not detect the issue with Apktool and pursues the ana
  caption: [Exit status of static analysis tools on original #APKs (left) and patched #APKs (right)]
 ) <fig:th-status-npatched-vs-patched>

+The finishing rate comparison is shown in @fig:th-status-npatched-vs-patched. 
+We can see that in most cases, the finishing rate is either the same or slightly lower for the instrumented application.
+This is consistent with the fact that we add more bytecode to the application, hence adding more opportunities for failure during analysis.
+They are two notable exceptions: Saaf and IC3.
+The finishing rate of IC3, which was previously reasonable, dropped to 0 after our instrumentation, while the finishing rate of Saaf jumped to 100%, which is extremely suspicious.
+Analysing the logs of the analysis showed that both cases have the same origin: the bytecode generated by our instrumentation has a version number of 37 (the version introduced by Android 7.0).
+Unfortunately, neither the version of Apktool used by Saaf nor Dare (the tool used by IC3 to convert Dalvik bytecode to Java bytecode) recognises this version of bytecode, and thus failed to parse the #APK.
+In the case of Dare and IC3, our experiment correctly identifies this as a crash.
+On the other hand, Saaf do not detect the issue with Apktool and pursues the analysis with no bytecode to analyse and returns a valid return file, but for an empty application.
+
 #todo[Flowdroid results are inconclusive: some apks have more leak after and as many apks have less? also, runing flowdroid on the same apk can return a different number of leak???]

 === Example
@ -266,6 +267,35 @@ Although self-explanatory, verifying the code of those methods indeed confirms t
    caption: [Code of `Main.main()`, as shown by Jadx, after patching],
 )<lst:th-demo-after>

+#todo[alt text for @fig:th-cg-before and @fig:th-cg-after]
+#figure([
+  #figure(
+    render(
+      read("figs/demo_main_main.dot"),
+      width: 100%,
+      alt: (
+        "",
+      ).join(),
+    ),
+    caption: [Call Graph of `Main.main()` generated by Androguard before patching],
+  ) <fig:th-cg-before>
+  
+  #figure(
+    render(
+      read("figs/patched_main_main.dot"),
+      width: 100%,
+      alt: (
+        "",
+      ).join(),
+    ),
+    caption: [Call Graph of `Main.main()` generated by Androguard after patching],
+  ) <fig:th-cg-after>
+  ],
+  caption: none,
+  kind: "th-cg-cmp-andro",
+  supplement: none,
+)
+
 For a higher-level view of the method, we can also look at its call graph.
 We used Androguard to generate the call graphs in @fig:th-cg-before and @fig:th-cg-after#footnote[We manually edited the generated .dot files for readability.].
@fig:th-cg-before shows the original call graph, and gives a good idea of the obfuscation methods used: we can see calls to `Main.decrypt(String)` that itself calls cryptographic #APIs, as well as calls to `ClassLoader.loadClass(String)`, `Class.getMethod(String, Class[])` and `Method.invoke(Object, Object[])`.
@ -275,34 +305,11 @@ In grey on the figure, we can see the glue methods (`T.check_is_Xxx_xxx(Method)`
 Those methods are part of the instrumentation process presented in @sec:th-trans, but do not bring a lot to the analysis of the call graph.
 In red on the figure however, we have the calls that were hidded by reflection in the first call graph, and thank to the bytecode of the methods called being injected in the application, we can also see that they call `Utils.source(String)` and `Utils.sink(String)`, the methods we defined for this application as source of confidential data and exfiltration method.

-#todo[alt text for @fig:th-cg-before and @fig:th-cg-after]
-#figure(
-  render(
-    read("figs/demo_main_main.dot"),
-    width: 100%,
-    alt: (
-      "",
-    ).join(),
-  ),
-  caption: [Call Graph of `Main.main()` generated by Androguard before patching],
-) <fig:th-cg-before>
-
-#figure(
-  render(
-    read("figs/patched_main_main.dot"),
-    width: 100%,
-    alt: (
-      "",
-    ).join(),
-  ),
-  caption: [Call Graph of `Main.main()` generated by Androguard after patching],
-) <fig:th-cg-after>
-
 === Androscalpel Performances <sec:th-lib-perf>

 Because we implemented our own instrumentation library, we wanted to compare it to other existing options.
 Unfortunately, we did not have time to compare the robustness and correctness of the generated applications.
-However, we did compare the performances of our library, Androscalpel, to Apktool and Soot.
+However, we did compare the performances of our library, Androscalpel, to Apktool and Soot, over the first 100 applications of RASTA (in alphabetical order of the SHA256).

 Due to time constraints, we could not test a complex transformation, as adding registers requires complex operations for both Androscalpel and Apktool (see @sec:th-implem for more details).
 We decided to test two operations: travelling the instructions of an application (a read-only operation), and regenerating an application, without modification (a read/write operation).
@ -316,19 +323,46 @@ It should be noted that all three of the tested tools have multiprocessing suppo
    table.header(
      table.cell(colspan: 2)[Tool], [Soot], [Apktool], [Androscalpel],
    ),
-    table.cell(rowspan: 2)[Read],
-      [Time], [], [], [],
-      [Mem], [], [], [],
-    table.cell(rowspan: 2)[Read/Write], 
-      [Time], [], [], [],
-      [Mem], [], [], [],
+    table.cell(colspan: nb_col, inset: 1pt, stroke: none)[],
+    table.cell(rowspan: 3)[Read],
+      [Time (s)], ..for tool in ("soot", "apktool", "androscalpel") {
+        let res = performance_results.at(tool).read
+        (num(calc.round(res.cumulative_time / res.nb_results, digits: 2)),)
+      },
+      [Mem (GB)], ..for tool in ("soot", "apktool", "androscalpel") {
+        let res = performance_results.at(tool).read
+        (num(calc.round(res.cumulative_mem / res.nb_results / 1000000, digits: 2)),)
+      },
+      [Detected Crashes], ..for tool in ("soot", "apktool", "androscalpel") {
+        let res = performance_results.at(tool).read
+        (num(100 - res.nb_results),)
+      },
+    table.cell(colspan: nb_col, inset: 1pt, stroke: none)[],
+    table.cell(rowspan: 3)[Read/Write],
+      [Time (s)], ..for tool in ("soot", "apktool", "androscalpel") {
+        let res = performance_results.at(tool).write
+        (num(calc.round(res.cumulative_time / res.nb_results, digits: 2)),)
+      },
+      [Mem (GB)], ..for tool in ("soot", "apktool", "androscalpel") {
+        let res = performance_results.at(tool).write
+        (num(calc.round(res.cumulative_mem / res.nb_results / 1000000, digits: 2)),)
+      },
+      [Detected Crashes], ..for tool in ("soot", "apktool", "androscalpel") {
+        let res = performance_results.at(tool).write
+        (num(100 - res.nb_results),)
+      },
  )},
  caption: [Average time and memory consumption of Soot, Apktool and Androscalpel]
 ) <tab:th-compare-perf>

@tab:th-compare-perf compares the resources consumed by each tool for each operation.
+We can see that for read-only operation, we are 16 times faster than Soot and 8 times faster than Apktool, while keeping a smaller memory footprint.
+When generating an application, the gap lessens, but we are still almost 8 times faster than Soot.
+Some of this difference probably comes from implementation choices: Soot and Apktool are implemented in Java, which has a noticeable overhead compared to Rust.
+However, a noticeable part of this difference can also be explained by the specialised nature of our library; we did not implement all the features Soot has, and we do not parse Android resources like Apktool does.
+Having better performances does not means that our solution can replace the other in all cases.

-#todo[Conlude depending on the results of the experiment]
+Nevertheless, it should be noted that over the 100 applications tested, Soot failed to regenerate 10 of them, Apktool 4, and Androscalpel only 1, showing that our efforts to limit crashes were successful.

 #midskip

--- a/5_theseus/7_conclusion.typ
+++ b/5_theseus/7_conclusion.typ
@ -2,12 +2,12 @@

 == Conclusion <sec:th-conclusion>

-In this chapter, we presented a set of transformations to apply to an application to encode reflection calls and code loaded dynamically inside the application.
+In this chapter, we presented a set of transformations to encode reflection calls and code loaded dynamically inside the application.
 We also presented a dynamic analysis approach to collect the information needed to perform those transformations.

 We then applied this method to a recent subset of applications of our dataset from @sec:rasta.
 When comparing the success rate of the tools of @sec:rasta on the applications before and after the transformation, we found that, in general, the success rate of those tools slightly decreases (except for a few tools).
-We also showed that our transformation indeed allows static analysis tools to access and process that runtime information in their analysis.
+We also showed that our transformation allows static analysis tools to access and process that runtime information in their analysis.
 However, a more in-depth look at the results of our dynamic analysis showed that our code coverage is lacking, and that the great majority of dynamically loaded code we intercepted is from generic advertisement and telemetry libraries.

 #v(2em)
--- a/5_theseus/data/performance_results.json
+++ b/5_theseus/data/performance_results.json
@ -30,8 +30,8 @@
      "nb_results": 100
    },
    "write": {
-      "cumulative_time": 3994.8299999999995,
-      "cumulative_mem": 179002824,
+      "cumulative_time": 7189.420000000001,
+      "cumulative_mem": 184730724,
      "nb_results": 96
    }
  }