wip

2025-09-09 17:05:19 +02:00 · 2025-09-09 17:05:19 +02:00 · ed8bbd12e5
commit ed8bbd12e5
parent e9bc1572e9
8 changed files with 85 additions and 24 deletions
--- a/5_theseus/4_results.typ
+++ b/5_theseus/4_results.typ
@ -1,4 +1,5 @@
-#import "../lib.typ": todo, SDK, num, mypercent, ART, ie, APKs, jfl-note
+#import "../lib.typ": SDK, num, mypercent, ART, ie, APKs, API,
+#import "../lib.typ": todo, jfl-note
 #import "X_var.typ": *
 #import "../3_rasta/X_var.typ": NBTOTALSTRING

@ -107,6 +108,15 @@ The remaining #num(nb_bytecode_collected - nb_google - nb_appsflyer - nb_faceboo

 === Impact on Analysis Tools Finishing Rate

+Unfortunately, our implementation of the transformation is imperfect and does fails some time.
+Over the #num(dyn_res.all.nb - dyn_res.all.nb_failed), #num(nb_patched) were patched.
+The remaining #mypercent(dyn_res.all.nb - dyn_res.all.nb_failed - nb_patched, dyn_res.all.nb - dyn_res.all.nb_failed) failed either due to some quirk in the zip format of the #APK file or because of a bug in our implementation when exceeding the method reference limit in a single #DEX file.
+Taking into accound the failure from both dynamic analysis and the patching, we have a #mypercent(dyn_res.all.nb - nb_patched, dyn_res.all.nb) failure rate.
+This is a reasonable failure rate, but we should keep in mind that it adds up to the failure rate of the other tools we want to use on the patched application.
+
+We then run the same experiment we run in @sec:rasta.
+We run the tools on the #APK before and after patching, and compared the finishing rates in @fig:th-status-npatched-vs-patched without taking into account #APKs we failed to patch#footnote[Due to an handling error during the experiment, the figure show the results for #nb_patched_rasta #APKs instead of #nb_patched.].
+
 #todo[alt text @fig:th-status-npatched-vs-patched]
 #todo[Check SAAF and IC3 results on patched]
 #figure({
@ -120,7 +130,7 @@ The remaining #num(nb_bytecode_collected - nb_google - nb_appsflyer - nb_faceboo
  caption: [Exist status of static analysis tools on original #APKs (left) and patched #APKs (right)]
 ) <fig:th-status-npatched-vs-patched>

-#todo[Check if flowdroid improve, compare sucess rate of RASTA, show result for demo app]
+#todo[Flowdroid results are inconclusive: some apks have more leak after and as many apks have less? also, runing flowdroid on the same apk can return a different number of leak???]

 #jfl-note[Combien d'app tranforme? on parle des 888? on fait les 2 tranformation sur chaque apk? ca reussit tout le temps?]

@ -158,7 +168,7 @@ public class Main {

    ...
 }
-  ```
+  ```,
  caption: [Code of the main class of the application showed by Jadx, before patching],
 )<fig:th-demo-before>

@ -175,7 +185,7 @@ This is not particularly surprising considering the obfusctation methods used.

 Then we run the dynamic analysis we described in @sec:th-dyn on the application and apply the transformation described in @sec:th-trans to add the dynamic informations to it.
 This time, Flowdroid compute a larger callgraph of 76 edges, and does find a data leak.
-Indeed, when looking at the new application with Jadx, we notice a new class `Malicious`, and the code of `Main.main()` is now as shown in @figth-demo-after:
+Indeed, when looking at the new application with Jadx, we notice a new class `Malicious`, and the code of `Main.main()` is now as shown in @fig:th-demo-after:
 the method called in the loop is either `Malicious.get_data`, `Malicious.send_data()` or `Method.invoke()`.
 Although self explanatory, verifying the code of those methods indeed confirm that `get_data()` calls `Utils.source()` and `send_data()` calls `Utils.sink()`.

@ -195,7 +205,7 @@ Although self explanatory, verifying the code of those methods indeed confirm th
              method.invoke(null, objArr);
        }
    }
-    ```
+    ```,
    caption: [Code of `Main.main()` showed by Jadx, after patching],
 )<fig:th-demo-after>

--- a/5_theseus/5_limits.typ
+++ b/5_theseus/5_limits.typ
@ -1,20 +1,27 @@
-#import "../lib.typ": paragraph, ART, DEX, APK
+#import "../lib.typ": paragraph, ART, DEX, APK, eg
 #import "../lib.typ": todo, jfl-note, jm-note

-== Limitations <sec:th-limits>
+== Limitations and Futur Works <sec:th-limits>

-#todo[Structure the section]
+The method we presented in this section has a number of underdeveloped aspects.
+In this section we will present those issues and potential avenues of improvement.

-#paragraph()[Custom Classloaders][
-#jfl-note(side: right)[The first obvious limitation is that we do not know what custom classloadrs do, so we cannot accuratly reproduce statically their behavior.][est ce que c'est une limite des 2 transformations proposées? j'ai l'impression que tu veux faire une 3ieme transformation]
+=== Bytecode Transformation
+
+#paragraph[Custom Classloaders][
+The first obvious limitation of our bytecode transformation is that we do not know what custom classloadrs do, so we cannot accuratly reproduce statically their behavior.
 We elected to fallback to the behavior of the `BaseDexClassLoader`, which is the highest Android specific classloader in the inheritance hierarchy, and whose behavior is shared by all classloaders safe `DelegateLastClassLoader`.
 The current implementation of the #ART enforce some restrictions on the classloaders behavior to optimize the runtime performance by caching classes.
 This gives us some garanties that custom classesloaders will keep a some coherences will the classic classloaders.
 For instance, a class loaded dynamically must have the same name as the name used in `ClassLoader.loadClass()`.
 This make `BaseDexClassLoader` a good estimation for legitimate classloaders, however, an obfuscated application could use the techniques discussed in @sec:cl-cross-obf, in wich case our model would be entirelly wrong.
+
+It would be interesting to expore if some form of static analysis like symbolic execution could be used to extract the behavior of an ad hoc class loader and be used to model the class used appropriately.
+A more reasonable approach however would be to improve the static analysis to intercept each calls of `loadClass()` of each class loaders, including implicite calls performed by the #ART.
+This would allow to collect a mapping $("class loader", "class name") -> "class"$ that can then be used when renaming colliding classes.
 ]

-#paragraph()[Multiple Classloaders for one `Method.invoke()`][
+#paragraph[Multiple Classloaders for one `Method.invoke()`][
 Although we managed to handle call to different methods from one `Method.invoke()` site, we do not handle calling methods from different classloaders with colliding classes definition.
 The first reason is that it is quite challenging to compare classloaders statically.
 At runtime, each object has an unique identifier that can be used to compare them over the course of the same execution, but this identifier is reset each time the application starts.
@ -32,18 +39,38 @@ Instead, we elected to ignore the classloaders when selecting the method to invo
 This leads to potential invalid runtime behaviore, as the first method that matching the class name will be called, but the alternative methods from other classloader still appears in the new application, albeit in a block that might be flagged as dead-code by a sufficiently advenced static analyser. 
 ]

-#paragraph()[`ClassNotFoundException` may not be raised][
-In the very specific situation where the original application tries to access a class from dynamically loaded bytecode without actually accessing this bytecode, the patched application behavior will differ.
+#paragraph[`ClassNotFoundException` may not be raised][
+In the very specific situation where the original application tries to access a class from dynamically loaded bytecode without actually accessing this bytecode (#eg by using the wrong class loader), the patched application behavior will differ.
 The original application should raise a `ClassNotFoundException`, but in the patched application, the class will be accible and the exception will not be raised.
 In pactice, their is not a lot of reason to do such thing.
 One could be to check if the #APK as been tempered with, but their are easier ways to do thins, like checking the application signature.
-#jm-note[Exception oriented programming worth mentioning? ]
 Another would be to check if the class is already available, and if not, load it dynamically, in wich case it does not matter as code loaded dynamically is already present.
 In any cases, statically, because we remove neither the calls to the function that load the classes (like `ClassLoader.loadClass(..)`) nor the `try` / `catch` blocks, static analysis tools those can handle the original behavior should still be hable to access the old behavior.
 ]


-#todo[
- Use multidex: min SDK >= 21 (android 5.0, published in 2014, should be ok)
- No support for OAT (platform dependent)
+=== Dynamic Analysis
+
+#paragraph[Anti Evasion][
+  Our dynamic analysis does not permform any kind of anti-evasive technique.
+  Any application implementing even basic evasion will detect our environment and will probably not load malicious bytecode.
+  Running the dynamic analysis in a appropriate sandbox such as DroidDungeon should improve the results significantly.
 ]
+
+#paragraph[Code Coverage][
+  In @sec:th-dyn-failure, we saw that our dynamic analysis performed poorly.
+  It may be due to our experimental setup, and it is possible that a better sandbox will fix the issue.
+  However their is a larger code coverage issue.
+  We tried to manually analysed a few applications marked as malware on MalwareBazaar to test our method.
+  Although we did confirm statically that the applications where using reflection and dynamic code loading, we did not managed to trigger this behavior at runtime, and other obfuscation technique make it verry difficult to determine statically the required condition to trigger them.
+  Thus, we believe that techniques to improve code coverage are indeed needed when analysing application.
+  This could mean better exploration techniques such as the one implemented by Stoat and GroddDroid, or more intrusive approched such as forced excecution.
+]
+
+=== Comparision to DroidRA
+
+It would be very interesting to compare our tool to DroidRA.
+DroidRA is a tool that compute reflection information using static analysis and patch the application to add those calls to the application?
+Beyond the classic comparison static vs dynamic, DroidRA has a similar goal and strategy to ours.
+Two notable comparison criteria would be the failure rate and the number of edges added to an application call graph.
+The first criterion indicate how much the results can be used by other tools, while the second indicate how effective the approaches are.
--- a/5_theseus/6_conclusion.typ
+++ b/5_theseus/6_conclusion.typ
@ -0,0 +1,20 @@
+#import "../lib.typ": pb3, pb3-text, highlight-block, todo
+
+== Conclusion <sec:th-conclusion>
+
+In this chapter, we presented a set of transformations to apply to an application to encode reflection calls and code loaded dynamically inside the application.
+We also presented a dynamic analysis approach to collect the information needed to perform those transformations.
+
+We then applied this method to applications a rescent subset of applications of our dataset from @sec:rasta.
+When comparing the success rate of the tools of @sec:rasta on the applications before and after the transformation, we found that, in general, the success rate of those tools slightly decrease, with a few exceptions.
+We also showed that our transformation indeed allow static analysis tools to access and process those runtime information in their analysis.
+However, a more in-depth look at the results of our dynamic analysis showed that our code coverage is lacking, and that the great majority of dynamically loaded code we intercepted is from generic advertisement and telemetry libraries.
+
+#v(1.5em)
+
+#align(center, highlight-block(inset: 15pt, width: 75%, breakable: false, block(align(left)[
+  #pb3: #pb3-text
+  #v(0.75em)
+
+  #todo[Revoir la problématique]
+])))
--- a/5_theseus/X_var.typ
+++ b/5_theseus/X_var.typ
@ -23,6 +23,9 @@
  )
 )

+#let nb_patched = 4681
+#let nb_patched_rasta = 4274 // I fucked up the script...
+

 #let bytecode_hashes = (
  (273, "bee390afa2267bc48829ee7a0f4286895bf32ba2443ff447451f515818f7203b", "Lcom/facebook/ads/*", DEX),
--- a/5_theseus/main.typ
+++ b/5_theseus/main.typ
@ -16,3 +16,4 @@
 #include("3_dynamic_data_collection.typ")
 #include("4_results.typ")
 #include("5_limits.typ")
+#include("6_conclusion.typ")