wip

2025-09-09 17:05:19 +02:00 · 2025-09-09 17:05:19 +02:00 · ed8bbd12e5
commit ed8bbd12e5
parent e9bc1572e9
8 changed files with 85 additions and 24 deletions
--- a/1_introduction/main.typ
+++ b/1_introduction/main.typ
@ -83,8 +83,7 @@ Reflection is another common obfuscation technique against static analysis.
 Instead of directly invoking methods, the generic `Method.invoke()` #API is used, and the method is retrieved from its name in the form of a character string.
 Finding the value of this string can be quite difficult to determine statically, so it is once again an issue more suitable for dynamic analysis.
 When encountering a complex case of reflection (#ie using ciphered strings) or code loading, a reverse engineer will switch to dynamic analysis to collect the relevant data (the name of the methods called or the code that was loaded), then switch back to static analysis.
-This is doable for a manual analysis; unfortunately, the more automated tools that would require that runtime information to perform an accurate analysis may not have a way to access this new data.
-This led us to our last problem statement:
+This is doable for a manual analysis; unfortunately, the more complex tools that would require that runtime information to perform an accurate analysis may not have a way to access this new data.
 ][

  Peu developpé. 
@ -95,6 +94,8 @@ This led us to our last problem statement:

  TODO: trouver un example simple a formuler
 ]
+Some contribution made the results they computed available to other tools by modifying the application (intrumenting) in a way that reflect those results.
+This led us to our last problem statement:
 #highlight-block(breakable: false)[
  *Pb3*: #pb3-text 

--- a/2_background/8_instrumentation.typ
+++ b/2_background/8_instrumentation.typ
@ -8,7 +8,7 @@ The term can also be used more generally to describe operation that modify the a
 In this section, we will focus on the use of instrumentation that make an application easier to analyse by other tools, instead of just collecting additionnal information at runtime.

 I the previous section, we gave the example of AppSpear~@yang_appspear_2015, that reconstruct #DEX files intercepted at runtime and repackage the #APK with the new code in it.
-DexLeog~@dexlego has a similar but a lot more aggressive technique. 
+DexLego~@dexlego has a similar but a lot more aggressive technique. 
 It targets heavily obfuscated packer that decrypt then reencrypt the methods instructions just in time.
 To get the bytecode, DexLego log each instruction executed by the #ART, and reconstruct the methods, then the #DEX files, from this stream of instructions.
 The main limitation of this technique is that it carrys over the limitation of dynamic analysis to static analysis: the bytecode injected in the application is limited to the instructions executed during the dynamic analysis.
@ -35,8 +35,7 @@ It has been used by tools like AppSpear and DexLego to expose heavily obfuscated
 Similarly, DroidRA compute reflection information computed statically and inject the actual method calls inside the application it returns.
 However, AppSpear and DexLego focus primarely on specific obfuscation techniques, making there implementation difficult to port to more rescent version of Android, and DroidRA suffers the limitation of static analysis.
 We believe that instrumentation is a promising approach to encode those information.
-#jm-note(side: right)[Especially, we think that using it to provide information collected by even a simple dynamic analysis could be significantly beneficial for many tools.][Urf, this is over promising considering the work done in @sec:th]
-
-#jm-note(side: left)[#pb3: #pb3-text][Yeah no, this need a revision]
+Especially, we think that it could be used to provide dynamic information that are not available to static analysis tools like DroidRA.
+To explore this possibility, we will try to anwser our third problem statement #pb3: #pb3-text


--- a/5_theseus/4_results.typ
+++ b/5_theseus/4_results.typ
@ -1,4 +1,5 @@
-#import "../lib.typ": todo, SDK, num, mypercent, ART, ie, APKs, jfl-note
+#import "../lib.typ": SDK, num, mypercent, ART, ie, APKs, API,
+#import "../lib.typ": todo, jfl-note
 #import "X_var.typ": *
 #import "../3_rasta/X_var.typ": NBTOTALSTRING

@ -107,6 +108,15 @@ The remaining #num(nb_bytecode_collected - nb_google - nb_appsflyer - nb_faceboo

 === Impact on Analysis Tools Finishing Rate

+Unfortunately, our implementation of the transformation is imperfect and does fails some time.
+Over the #num(dyn_res.all.nb - dyn_res.all.nb_failed), #num(nb_patched) were patched.
+The remaining #mypercent(dyn_res.all.nb - dyn_res.all.nb_failed - nb_patched, dyn_res.all.nb - dyn_res.all.nb_failed) failed either due to some quirk in the zip format of the #APK file or because of a bug in our implementation when exceeding the method reference limit in a single #DEX file.
+Taking into accound the failure from both dynamic analysis and the patching, we have a #mypercent(dyn_res.all.nb - nb_patched, dyn_res.all.nb) failure rate.
+This is a reasonable failure rate, but we should keep in mind that it adds up to the failure rate of the other tools we want to use on the patched application.
+
+We then run the same experiment we run in @sec:rasta.
+We run the tools on the #APK before and after patching, and compared the finishing rates in @fig:th-status-npatched-vs-patched without taking into account #APKs we failed to patch#footnote[Due to an handling error during the experiment, the figure show the results for #nb_patched_rasta #APKs instead of #nb_patched.].
+
 #todo[alt text @fig:th-status-npatched-vs-patched]
 #todo[Check SAAF and IC3 results on patched]
 #figure({
@ -120,7 +130,7 @@ The remaining #num(nb_bytecode_collected - nb_google - nb_appsflyer - nb_faceboo
  caption: [Exist status of static analysis tools on original #APKs (left) and patched #APKs (right)]
 ) <fig:th-status-npatched-vs-patched>

-#todo[Check if flowdroid improve, compare sucess rate of RASTA, show result for demo app]
+#todo[Flowdroid results are inconclusive: some apks have more leak after and as many apks have less? also, runing flowdroid on the same apk can return a different number of leak???]

 #jfl-note[Combien d'app tranforme? on parle des 888? on fait les 2 tranformation sur chaque apk? ca reussit tout le temps?]

@ -158,7 +168,7 @@ public class Main {

    ...
 }
-  ```
+  ```,
  caption: [Code of the main class of the application showed by Jadx, before patching],
 )<fig:th-demo-before>

@ -175,7 +185,7 @@ This is not particularly surprising considering the obfusctation methods used.

 Then we run the dynamic analysis we described in @sec:th-dyn on the application and apply the transformation described in @sec:th-trans to add the dynamic informations to it.
 This time, Flowdroid compute a larger callgraph of 76 edges, and does find a data leak.
-Indeed, when looking at the new application with Jadx, we notice a new class `Malicious`, and the code of `Main.main()` is now as shown in @figth-demo-after:
+Indeed, when looking at the new application with Jadx, we notice a new class `Malicious`, and the code of `Main.main()` is now as shown in @fig:th-demo-after:
 the method called in the loop is either `Malicious.get_data`, `Malicious.send_data()` or `Method.invoke()`.
 Although self explanatory, verifying the code of those methods indeed confirm that `get_data()` calls `Utils.source()` and `send_data()` calls `Utils.sink()`.

@ -195,7 +205,7 @@ Although self explanatory, verifying the code of those methods indeed confirm th
              method.invoke(null, objArr);
        }
    }
-    ```
+    ```,
    caption: [Code of `Main.main()` showed by Jadx, after patching],
 )<fig:th-demo-after>

--- a/5_theseus/5_limits.typ
+++ b/5_theseus/5_limits.typ
@ -1,20 +1,27 @@
-#import "../lib.typ": paragraph, ART, DEX, APK
+#import "../lib.typ": paragraph, ART, DEX, APK, eg
 #import "../lib.typ": todo, jfl-note, jm-note

-== Limitations <sec:th-limits>
+== Limitations and Futur Works <sec:th-limits>

-#todo[Structure the section]
+The method we presented in this section has a number of underdeveloped aspects.
+In this section we will present those issues and potential avenues of improvement.

-#paragraph()[Custom Classloaders][
-#jfl-note(side: right)[The first obvious limitation is that we do not know what custom classloadrs do, so we cannot accuratly reproduce statically their behavior.][est ce que c'est une limite des 2 transformations proposées? j'ai l'impression que tu veux faire une 3ieme transformation]
+=== Bytecode Transformation
+
+#paragraph[Custom Classloaders][
+The first obvious limitation of our bytecode transformation is that we do not know what custom classloadrs do, so we cannot accuratly reproduce statically their behavior.
 We elected to fallback to the behavior of the `BaseDexClassLoader`, which is the highest Android specific classloader in the inheritance hierarchy, and whose behavior is shared by all classloaders safe `DelegateLastClassLoader`.
 The current implementation of the #ART enforce some restrictions on the classloaders behavior to optimize the runtime performance by caching classes.
 This gives us some garanties that custom classesloaders will keep a some coherences will the classic classloaders.
 For instance, a class loaded dynamically must have the same name as the name used in `ClassLoader.loadClass()`.
 This make `BaseDexClassLoader` a good estimation for legitimate classloaders, however, an obfuscated application could use the techniques discussed in @sec:cl-cross-obf, in wich case our model would be entirelly wrong.
+
+It would be interesting to expore if some form of static analysis like symbolic execution could be used to extract the behavior of an ad hoc class loader and be used to model the class used appropriately.
+A more reasonable approach however would be to improve the static analysis to intercept each calls of `loadClass()` of each class loaders, including implicite calls performed by the #ART.
+This would allow to collect a mapping $("class loader", "class name") -> "class"$ that can then be used when renaming colliding classes.
 ]

-#paragraph()[Multiple Classloaders for one `Method.invoke()`][
+#paragraph[Multiple Classloaders for one `Method.invoke()`][
 Although we managed to handle call to different methods from one `Method.invoke()` site, we do not handle calling methods from different classloaders with colliding classes definition.
 The first reason is that it is quite challenging to compare classloaders statically.
 At runtime, each object has an unique identifier that can be used to compare them over the course of the same execution, but this identifier is reset each time the application starts.
@ -32,18 +39,38 @@ Instead, we elected to ignore the classloaders when selecting the method to invo
 This leads to potential invalid runtime behaviore, as the first method that matching the class name will be called, but the alternative methods from other classloader still appears in the new application, albeit in a block that might be flagged as dead-code by a sufficiently advenced static analyser. 
 ]

-#paragraph()[`ClassNotFoundException` may not be raised][
-In the very specific situation where the original application tries to access a class from dynamically loaded bytecode without actually accessing this bytecode, the patched application behavior will differ.
+#paragraph[`ClassNotFoundException` may not be raised][
+In the very specific situation where the original application tries to access a class from dynamically loaded bytecode without actually accessing this bytecode (#eg by using the wrong class loader), the patched application behavior will differ.
 The original application should raise a `ClassNotFoundException`, but in the patched application, the class will be accible and the exception will not be raised.
 In pactice, their is not a lot of reason to do such thing.
 One could be to check if the #APK as been tempered with, but their are easier ways to do thins, like checking the application signature.
-#jm-note[Exception oriented programming worth mentioning? ]
 Another would be to check if the class is already available, and if not, load it dynamically, in wich case it does not matter as code loaded dynamically is already present.
 In any cases, statically, because we remove neither the calls to the function that load the classes (like `ClassLoader.loadClass(..)`) nor the `try` / `catch` blocks, static analysis tools those can handle the original behavior should still be hable to access the old behavior.
 ]


-#todo[
- Use multidex: min SDK >= 21 (android 5.0, published in 2014, should be ok)
- No support for OAT (platform dependent)
+=== Dynamic Analysis
+
+#paragraph[Anti Evasion][
+  Our dynamic analysis does not permform any kind of anti-evasive technique.
+  Any application implementing even basic evasion will detect our environment and will probably not load malicious bytecode.
+  Running the dynamic analysis in a appropriate sandbox such as DroidDungeon should improve the results significantly.
 ]
+
+#paragraph[Code Coverage][
+  In @sec:th-dyn-failure, we saw that our dynamic analysis performed poorly.
+  It may be due to our experimental setup, and it is possible that a better sandbox will fix the issue.
+  However their is a larger code coverage issue.
+  We tried to manually analysed a few applications marked as malware on MalwareBazaar to test our method.
+  Although we did confirm statically that the applications where using reflection and dynamic code loading, we did not managed to trigger this behavior at runtime, and other obfuscation technique make it verry difficult to determine statically the required condition to trigger them.
+  Thus, we believe that techniques to improve code coverage are indeed needed when analysing application.
+  This could mean better exploration techniques such as the one implemented by Stoat and GroddDroid, or more intrusive approched such as forced excecution.
+]
+
+=== Comparision to DroidRA
+
+It would be very interesting to compare our tool to DroidRA.
+DroidRA is a tool that compute reflection information using static analysis and patch the application to add those calls to the application?
+Beyond the classic comparison static vs dynamic, DroidRA has a similar goal and strategy to ours.
+Two notable comparison criteria would be the failure rate and the number of edges added to an application call graph.
+The first criterion indicate how much the results can be used by other tools, while the second indicate how effective the approaches are.
--- a/5_theseus/6_conclusion.typ
+++ b/5_theseus/6_conclusion.typ
@ -0,0 +1,20 @@
+#import "../lib.typ": pb3, pb3-text, highlight-block, todo
+
+== Conclusion <sec:th-conclusion>
+
+In this chapter, we presented a set of transformations to apply to an application to encode reflection calls and code loaded dynamically inside the application.
+We also presented a dynamic analysis approach to collect the information needed to perform those transformations.
+
+We then applied this method to applications a rescent subset of applications of our dataset from @sec:rasta.
+When comparing the success rate of the tools of @sec:rasta on the applications before and after the transformation, we found that, in general, the success rate of those tools slightly decrease, with a few exceptions.
+We also showed that our transformation indeed allow static analysis tools to access and process those runtime information in their analysis.
+However, a more in-depth look at the results of our dynamic analysis showed that our code coverage is lacking, and that the great majority of dynamically loaded code we intercepted is from generic advertisement and telemetry libraries.
+
+#v(1.5em)
+
+#align(center, highlight-block(inset: 15pt, width: 75%, breakable: false, block(align(left)[
+  #pb3: #pb3-text
+  #v(0.75em)
+
+  #todo[Revoir la problématique]
+])))
--- a/5_theseus/X_var.typ
+++ b/5_theseus/X_var.typ
@ -23,6 +23,9 @@
  )
 )

+#let nb_patched = 4681
+#let nb_patched_rasta = 4274 // I fucked up the script...
+

 #let bytecode_hashes = (
  (273, "bee390afa2267bc48829ee7a0f4286895bf32ba2443ff447451f515818f7203b", "Lcom/facebook/ads/*", DEX),
--- a/5_theseus/main.typ
+++ b/5_theseus/main.typ
@ -16,3 +16,4 @@
 #include("3_dynamic_data_collection.typ")
 #include("4_results.typ")
 #include("5_limits.typ")
+#include("6_conclusion.typ")
--- a/lib.typ
+++ b/lib.typ
@ -44,4 +44,4 @@
 #let pb2 = link(<pb-2>)[*Pb2*]
 #let pb2-text = [_What is the default Android class loading algorithm, and does it impact static analysis?_]
 #let pb3 = link(<pb-3>)[*Pb3*]
-#let pb3-text = [_Can we provide dynamic code loading and reflection data collected dynamically to any static analysis tools to improve their results?_]
+#let pb3-text = [_Can we use instrumentation to provide dynamic code loading and reflection data collected dynamically to static analysis tools and improve their results?_]