diff --git a/3_rasta/9_conclusion.typ b/3_rasta/9_conclusion.typ index dbe8fc1..c506628 100644 --- a/3_rasta/9_conclusion.typ +++ b/3_rasta/9_conclusion.typ @@ -21,7 +21,7 @@ Finally, we showed that malware #APKs generate less fatal errors than goodware w Following Reaves #etal recommendations~@reaves_droid_2016, we publish the Docker and Singularity images we built to run our experiments alongside the Docker files. This will allow the research community to use directly the tools without the build and installation penalty. -#v(1.5em) +#v(2em) #align(center, highlight-block(inset: 15pt, width: 75%, breakable: false, block(align(left)[ #pb1: #pb1-text diff --git a/4_class_loader/6_conclusion.typ b/4_class_loader/6_conclusion.typ index 6c2bc17..aec4961 100644 --- a/4_class_loader/6_conclusion.typ +++ b/4_class_loader/6_conclusion.typ @@ -14,7 +14,7 @@ More suspiciously, #shadowhidden of applications are shadowing a hidden class, w Investigations for applications that defined classes multiple times suggest that the compilation process or the inclusion of different versions of the same library is the main explanation. Finally, when investigating malware samples, we found a specific sample containing a shadow attack that would hide a part of the critical code from a reverser studying the application. -#v(1.5em) +#v(2em) #align(center, highlight-block(inset: 15pt, width: 75%, breakable: false, block(align(left)[ #pb2: #pb2-text diff --git a/5_theseus/2_overview.typ b/5_theseus/2_overview.typ new file mode 100644 index 0000000..7c96cb4 --- /dev/null +++ b/5_theseus/2_overview.typ @@ -0,0 +1,58 @@ +#import "../lib.typ": todo, APK, DEX, JAR, OAT, SDK, eg, ART, jm-note, jfl-note +#import "@preview/diagraph:0.3.5": raw-render + +== Overview + +Our objectif is to make available some dynamic information to any analysis tool able to analyse an Android #APK. +To do so, we elected to follow the path of a few contributions we presented in @sec:bg such as DroidRA~@li_droidra_2016 and use instrumentation. +Contrary to DroidRA, which use static analysis to compute the values of string and from that the methods used by reflection, we chose to dynamic analysis. +This allows us to collect informations that are simply not available statically (#eg a string send from a remote command and control server). +The tradeoff beeing the lack of exhaustiveness: dynamic analysis is known to have code coverage issues. + +#figure( + raw-render( + ``` + digraph { + rankdir=LR + + splines="ortho" + + + APK [shape=parallelogram] + "Automated Runner" + "Reverse Engineer" + "Dynamic Analysis" [shape=box] + "Runtime Information" [shape=parallelogram] + Transformation [shape=box] + "APK'" [shape=parallelogram] + + APK:c -> "Dynamic Analysis" + "Automated Runner" -> "Dynamic Analysis" [style="dashed"] + "Reverse Engineer" -> "Dynamic Analysis" [style="dashed"] + "Dynamic Analysis" -> "Runtime Information" + APK -> Transformation + "Runtime Information" -> Transformation + Transformation -> "APK'" + } + ```, + width: 100%, + alt: ( + "A diagram showing the process to transform an application.", + "Dotted arrows go from a \"Automated Runner\" and from \"Reverse Engineer\" to a box labeled \"Dynamic Analysis\", as well as plain arrow from \"APK\" to \"Dynamic Analysis\".", + "An arrow goes from \"Dynamic Analysis\" to \"Runtime Information\", then from \"Runtime Information\" to a box labeled \"Transformation\".", + "Another arrow goes from \"APK\" to \"Transformation\".", + "Finally, an arrow goes from \"Transformation\" to \"APK'\"." + ).join(), + ), + caption: [Process to add runtime information to an #APK], +) + +@fig:th-process summarize our process. +We first take an application that we analyse dynamically. +To improve code coverage, either an reverse engineer or an automated runner will interact with the application. +During this analysis, we use Frida to capture dynamic informations like the name of the methods called using reflection and bytecode loaded at runtime. +This analysis described in @sec:th-dyn. + +The data collected by this analysis is then combined to application, transforming the application into another one that can then be analyzed further. +We present the details of this transformation in @sec:th-trans. +Since the transformation drives the data we need to collect, we have decided to place this section first in this chapter. diff --git a/5_theseus/2_static_transformation.typ b/5_theseus/3_static_transformation.typ similarity index 99% rename from 5_theseus/2_static_transformation.typ rename to 5_theseus/3_static_transformation.typ index c3b945a..ff0fc70 100644 --- a/5_theseus/2_static_transformation.typ +++ b/5_theseus/3_static_transformation.typ @@ -2,10 +2,6 @@ == Code Transformation -#todo[Define code loading and reflection somewhere] -#todo[This is a draft, clean this up] -#todo[Reflectif call? Reflection call?] - In this section, we will see how we can transform the application code to make dynamic codeloading and reflexive calls more analysable by static analysis tools. === Transforming Reflection @@ -227,7 +223,7 @@ The pseudo-code in @lst:renaming-algo show the three steps of this algorithm: * #todo[interupting try blocks: catch block might expect temporary registers to still stored the saved value] ? */ -#h(2em) +#v(2em) Now that we saw the transformations we want to make, we know the runtime information we need to do it. In the next section, we will propose a solution to collect those informations. diff --git a/5_theseus/3_dynamic_data_collection.typ b/5_theseus/4_dynamic_data_collection.typ similarity index 83% rename from 5_theseus/3_dynamic_data_collection.typ rename to 5_theseus/4_dynamic_data_collection.typ index d41685d..20c9a3a 100644 --- a/5_theseus/3_dynamic_data_collection.typ +++ b/5_theseus/4_dynamic_data_collection.typ @@ -1,57 +1,17 @@ -#import "@preview/diagraph:0.3.5": raw-render -#import "../lib.typ": todo, SDK, API, ART, DEX, APK, JAR, ADB, jfl-note +#import "../lib.typ": todo, SDK, API, ART, DEX, APK, JAR, ADB, jfl-note, APKs == Collecting Runtime Information -@fig:th-process show the general idea of our process. To perform the transformations discribed in @sec:th-trans, we need information like the name and signature of the method called with reflection, or the actual bytecode loaded dynamically. We decided to collet those information through dynamic analysis. We saw in @sec:bg different contributions that collect this kind of information. -In the end, we decided to keep the analysis as simple as possible, so we avoided using a custom Android build like DexHunter, and instead use Frida(see @sec:bg-frida) to instrument the application and intercept calls of the methods that interest us. +In the end, we decided to keep the analysis as simple as possible, so we avoided using a custom Android build like DexHunter, and instead use Frida (see @sec:bg-frida) to instrument the application and intercept calls of the methods of interest. @sec:th-fr-dcl present our approach to collect dynamically loaded bytecode, and @sec:th-fr-ref present our approach to collect the reflection data. Because using dynamic analysis raise the concern of coverage, we also need some interaction with application during the analysis. Ideally, a reverse engineer would do the interaction. Because we wanted to analyse many applications in a reasonable time, we replaced this engineer by an automated runner that simulates the interactions. We discuss this option in @sec:th-grod. -#figure( - raw-render( - ``` - digraph { - rankdir=LR - - splines="ortho" - - - APK [shape=parallelogram] - "Automated Runner" - "Reverse Engineer" - "Dynamic Analysis" [shape=box] - "Runtime Information" [shape=parallelogram] - Transformation [shape=box] - "APK'" [shape=parallelogram] - - APK:c -> "Dynamic Analysis" - "Automated Runner" -> "Dynamic Analysis" [style="dashed"] - "Reverse Engineer" -> "Dynamic Analysis" [style="dashed"] - "Dynamic Analysis" -> "Runtime Information" - APK -> Transformation - "Runtime Information" -> Transformation - Transformation -> "APK'" - } - ```, - width: 100%, - alt: ( - "A diagram showing the process to transform an application.", - "Dotted arrows go from a \"Automated Runner\" and from \"Reverse Engineer\" to a box labeled \"Dynamic Analysis\", as well as plain arrow from \"APK\" to \"Dynamic Analysis\".", - "An arrow goes from \"Dynamic Analysis\" to \"Runtime Information\", then from \"Runtime Information\" to a box labeled \"Transformation\".", - "Another arrow goes from \"APK\" to \"Transformation\".", - "Finally, an arrow goes from \"Transformation\" to \"APK'\"." - ).join(), - ), - caption: [Process to add runtime information to an #APK], -) - === Collecting Bytecode Dynamically Loaded Initially, we considered instrumenting the constructor methods of the classloaders of the Android #SDK. @@ -116,5 +76,7 @@ Nonetheless, the benefit of our implementation is that it only requires a #ADB c Of course, to analyse a specific application, a reverse engineer could use an actual smartphone and explore the application manually. It would be a lot more stable than our automated batch analysis setup. -#todo[Futur work: Droiddonjon like, GroddDroid (or other) improved exploration, potentiellement faire de l'execution forcé avec frida] +#v(2em) +Now that we have both saw both the dynamic analysis setup and the transformation we want to perform on the #APKs, we put our proposed approach into practice. +In the next section, we will run our dynamic analysis on #APKs and studdy the look at the data collected as well a the impact the instrumentation has on appications and different analysis tools. diff --git a/5_theseus/4_results.typ b/5_theseus/5_results.typ similarity index 90% rename from 5_theseus/4_results.typ rename to 5_theseus/5_results.typ index 3ef1d39..0b6d8e9 100644 --- a/5_theseus/4_results.typ +++ b/5_theseus/5_results.typ @@ -241,7 +241,7 @@ Then we run the dynamic analysis we described in @sec:th-dyn on the application This time, Flowdroid compute a larger callgraph of 76 edges, and does find a data leak. Indeed, when looking at the new application with Jadx, we notice a new class `Malicious`, and the code of `Main.main()` is now as shown in @lst:th-demo-after: the method called in the loop is either `Malicious.get_data`, `Malicious.send_data()` or `Method.invoke()`. -Although self explanatory, verifying the code of those methods indeed confirm that `get_data()` calls `Utils.source()` and `send_data()` calls `Utils.sink()`. +Although self explanatory, verifying the code of those methods indeed confirms that `get_data()` calls `Utils.source()` and `send_data()` calls `Utils.sink()`. #figure( ```java @@ -263,9 +263,16 @@ Although self explanatory, verifying the code of those methods indeed confirm th caption: [Code of `Main.main()` showed by Jadx, after patching], ) +For an higher level view of the method, we can also look at its the call graph. +We used Androguard to generate the call graphes in @fig:th-cg-before and @fig:th-cg-after#footnote[We manually edited the generated .dot files for readability.]. +@fig:th-cg-before show the original call graph, and gives a good idea of the obfuscation methods used: we can see calls to `Main.decrypt(String)` that it self calls cryptographic #APIs, as well as calls to `ClassLoader.loadClass(String)`, `Class.getMethod(String, Class[])` and `Method.invoke(Object, Object[])`. +This indicate relflection calls base on ciphered strings, but does not reveal what the method actually does. +In comparison, @fig:th-cg-after, the call graph after instrumentation, still shows the cryptographic and reflection calls, be also four new methods calls. +In grey on the figure, we can see the glue methods (`T.check_is_Xxx_xxx(Method)`). +Those methods are part of the instrumentation process presented in @sec:th-trans, but do not bring a lot to the analysis of the call graph. +In red on the figure however, we have the calls that were hidded by reflection in the first call graph, and thank to the bytecode of the methods called being injected in the application, we can also see that they call `Utils.source(String)` and `Utils.sink(String)`, the methods we defined for this application as source of confidential data and exfiltration method. + #todo[alt text for @fig:th-cg-before and @fig:th-cg-after] -#todo[comment @fig:th-cg-before and @fig:th-cg-after] -#todo[Conclude and transition] #figure( render( read("figs/demo_main_main.dot"), @@ -288,6 +295,8 @@ Although self explanatory, verifying the code of those methods indeed confirm th caption: [Call Graph of `Main.main()` view by Androguard after patching], ) +#v(2em) - -#todo[androgard call graph] +To conclude, we showed that our approach indeed improves the results of analysis tools without impacting too much their finishing rate. +Infortunately, we also noticed that our dynamic analysis is suboptimal, either due to our experimental setup or due to our solution to explore the applications. +In the next section, we will present in more detail the limitation of our solution, as well as futur work that can be done to improve the contributions presented in this chapter. diff --git a/5_theseus/5_limits.typ b/5_theseus/6_limits.typ similarity index 97% rename from 5_theseus/5_limits.typ rename to 5_theseus/6_limits.typ index acafcc4..307fb1f 100644 --- a/5_theseus/5_limits.typ +++ b/5_theseus/6_limits.typ @@ -3,8 +3,8 @@ == Limitations and Futur Works -The method we presented in this section has a number of underdeveloped aspects. -In this section we will present those issues and potential avenues of improvement. +The method we presented in this chapter has a number of underdeveloped aspects. +In this section we will present those issues and potential avenues of improvement, related to the bytecode transformation, the dynamic analysis and DroidRA, a tool similar to our solution. === Bytecode Transformation @@ -70,7 +70,7 @@ In any cases, statically, because we remove neither the calls to the function th === Comparision to DroidRA It would be very interesting to compare our tool to DroidRA. -DroidRA is a tool that compute reflection information using static analysis and patch the application to add those calls to the application? +DroidRA is a tool that compute reflection information using static analysis and patch the application to add those calls to the application. Beyond the classic comparison static vs dynamic, DroidRA has a similar goal and strategy to ours. Two notable comparison criteria would be the failure rate and the number of edges added to an application call graph. The first criterion indicate how much the results can be used by other tools, while the second indicate how effective the approaches are. diff --git a/5_theseus/6_conclusion.typ b/5_theseus/7_conclusion.typ similarity index 75% rename from 5_theseus/6_conclusion.typ rename to 5_theseus/7_conclusion.typ index dc79d42..5184b1c 100644 --- a/5_theseus/6_conclusion.typ +++ b/5_theseus/7_conclusion.typ @@ -10,11 +10,11 @@ When comparing the success rate of the tools of @sec:rasta on the applications b We also showed that our transformation indeed allow static analysis tools to access and process those runtime information in their analysis. However, a more in-depth look at the results of our dynamic analysis showed that our code coverage is lacking, and that the great majority of dynamically loaded code we intercepted is from generic advertisement and telemetry libraries. -#v(1.5em) +#v(2em) #align(center, highlight-block(inset: 15pt, width: 75%, breakable: false, block(align(left)[ #pb3: #pb3-text #v(0.75em) - - #todo[Revoir la problématique] + We showed that intrumentation can be used to add the direct calls to method initially called using reflections, which, combined with the injection in the application of dynamically loaded bytecode, allows generic static analysis tools to acces previously unavailable code. + However, we also found that the dynamic analysis can be a significant bottleneck in this approach. ]))) diff --git a/5_theseus/main.typ b/5_theseus/main.typ index ab9f14b..9ceeb45 100644 --- a/5_theseus/main.typ +++ b/5_theseus/main.typ @@ -9,12 +9,10 @@ #todo[Abstract for @sec:th] ]))) - -#todo[better title for theseus chapter title for @sec:th] - #include("1_introduction.typ") -#include("2_static_transformation.typ") -#include("3_dynamic_data_collection.typ") -#include("4_results.typ") -#include("5_limits.typ") -#include("6_conclusion.typ") +#include("2_overview.typ") +#include("3_static_transformation.typ") +#include("4_dynamic_data_collection.typ") +#include("5_results.typ") +#include("6_limits.typ") +#include("7_conclusion.typ")