pass chapter 5

2025-09-30 03:05:07 +02:00 · 2025-09-30 03:05:07 +02:00 · d7df45b206
commit d7df45b206
parent f309dd55b8
8 changed files with 64 additions and 56 deletions
--- a/5_theseus/4_dynamic_data_collection.typ
+++ b/5_theseus/4_dynamic_data_collection.typ
@ -5,14 +5,14 @@
 To perform the transformations described in @sec:th-trans, we need information like the name and signature of the method called with reflection, or the actual bytecode loaded dynamically.
 We decided to collect that information through dynamic analysis.
 We saw in @sec:bg different contributions that collect this kind of information.
-In the end, we decided to keep the analysis as simple as possible, so we avoided using a custom Android build like DexHunter, and instead used Frida to instrument the application and intercept calls of the methods of interest.
-@sec:th-fr-dcl present our approach to collect dynamically loaded bytecode, and @sec:th-fr-ref present our approach to collect the reflection data.
-Because using dynamic analysis raises the concern of coverage, we also need some interaction with the application during the analysis.
+In the end, we decided to keep the analysis as simple as possible, so we avoided using a custom Android build like DexHunter and instead used Frida to instrument the application and intercept calls to the methods of interest.
+@sec:th-fr-dcl presents our approach to collect dynamically loaded bytecode, and @sec:th-fr-ref presents our approach to collect the reflection data.
+Because using dynamic analysis raises the concern of coverage, we also need some interaction with the graphical user interface of the application during the analysis.
 Ideally, a reverse engineer would do the interaction.
 Because we wanted to analyse many applications in a reasonable time, we replaced this engineer with an automated runner that simulates the interactions.
 We discuss this option in @sec:th-grod.

-=== Collecting Bytecode Dynamically Loaded <sec:th-fr-dcl>
+=== Collecting the Dynamically Loaded Bytecode <sec:th-fr-dcl>

 Initially, we considered instrumenting the constructor methods of the class loaders of the Android #SDK.
 However, this is a significant number of methods to instrument, and looking at older applications, we realised that we missed the `DexFile` class.
@ -23,7 +23,7 @@ We found that all those calls are from under either `DexFile.openInMemoryDexFile
 As a reference, in 2015, DexHunter~@zhang2015dexhunter already noticed `DexFile.openDexFileNative(..)` (although in the end DexHunter intruments another function, `DefineClass(..)`).
 `DefineClass(..)` is still a good function to instrument, but it is a C++ native method that does not have a Java interface, making it harder to work with using Frida, and we want to avoid patching the source code of the #ART like DexHunter did.
 For this reason, we decided to hook `DexFile.openInMemoryDexFilesNative(..)` and `DexFile.openDexFileNative(..)` instead.
-Those methods take as argument a list of Android code files, either in the form of in-memory byte arrays or file paths, and a reference to the classloader associated with the code.
+Those methods take a list of Android code files as argument, either in the form of in-memory byte arrays or file paths, and a reference to the classloader associated with the code.
 Instrumenting those methods allows us to collect all the code files loaded by the #ART and associate them with their class loaders.

 === Collecting Reflection Data <sec:th-fr-ref>
@ -39,10 +39,10 @@ This information is more difficult to collect than one would expect.
 It is stored in the stack, but before the #SDK 34, the stack was not directly accessible programmatically.
 Historically, when a reverse engineer needed to access the stack, they would trigger and catch an exception and get the stack from that exception.
 The issue with this approach is that data stored in exceptions is meant for debugging.
-In particullar, the location of the call in the bytecode has a different meaning depending on the debug information encoded in the bytecode.
+In particular, the location of the call in the bytecode has a different meaning depending on the debug information encoded in the bytecode.
 It can either be the address of the bytecode instruction invoking the callee method in the instruction array of the caller method, or the line number of the original source code that calls the callee method.
 Fortunately, in the #SDK 34, Android introduced the `StackWalker` #API.
-This #API allow to programatically travel the current stack and retrieve information from it, including the bytecode address of the instruction calling the callee methods.
+This #API allows to programatically travel the current stack and retrieve information from it, including the bytecode address of the instruction calling the callee methods.
 Considering that the line number is not a reliable information, we chose to use the new #API, despite the restrictions that come with choosing such a recent Android version (it was released in October 2023, around 2 years ago, and less than 50% of the current Android market share supports this #API today#footnote[https://gs.statcounter.com/android-version-market-share/mobile-tablet/worldwide/#monthly-202401-202508]).

 === Application Execution <sec:th-grod>
@ -50,9 +50,9 @@ Considering that the line number is not a reliable information, we chose to use
 Dynamic analysis requires actually running the application.
 In order to test multiple applications automatically, we needed to simulate human interactions with the applications.
 In @sec:bg, we presented a few solutions to explore an application dynamically.
-We first eliminated Sapienz, as it relies on an application instrumentation library called ELLA, which has not been updated for 9 years.
+We first eliminated Sapienz~@mao_sapienz_2016, as it relies on an application instrumentation library called ELLA, which has not been updated for 9 years.
 We also chose to avoid the Monkey because we noticed that it often triggers events that close the application (events like pressing the 'home' button, or opening the general settings drop-down menu at the top of the screen).
-Stoat and GroddDroid use UI Automator to interact with the application.
+Stoat~@su_guided_2017 and GroddDroid~@abraham_grodddroid_2015 use UI Automator to interact with the application.
 UI Automator is a standard Android #API intended for automatic testing.
 Both Soat and GroddDroid perform additional analysis on the application to improve the exploration.
 In the end, we elected to use the most basic execution mode of GroddDroid that does not need this additional analysis.
@ -72,7 +72,7 @@ Then we run the application for five minutes with GroddRunner, and at the end of
 If at some point an emulator stops responding for too long, we terminate it and restart it.

 As we will see in @sec:th-dyn-failure, our experimental setup is quite naive and still requires improvement. #todo(strike(stroke: green)[Comment on dit proprement que c'est tout pété?])
-For example, it does not implement any anti-evasion techniques, which can be a significant issue when analysing malware.
+For example, we do not implement any anti-evasion techniques, which can be a significant issue when analysing malware.
 Nonetheless, the benefit of our implementation is that it only requires an #ADB connection to a phone with a rooted Android system to work.
 Of course, to analyse a specific application, a reverse engineer could use an actual smartphone and explore the application manually.
 It would be a lot more stable than our automated batch analysis setup.