implem

2025-10-01 18:02:35 +02:00 · 2025-10-01 18:02:35 +02:00 · f5fee56cab
commit f5fee56cab
parent 4b0855b80e
3 changed files with 31 additions and 6 deletions
--- a/5_theseus/3_static_transformation.typ
+++ b/5_theseus/3_static_transformation.typ
@ -1,4 +1,4 @@
-#import "../lib.typ": todo, APK, DEX, JAR, OAT, SDK, eg, ART, jm-note, jfl-note
+#import "../lib.typ": todo, APK, APKs, DEX, JAR, OAT, SDK, eg, ART, jm-note, jfl-note

 == Code Transformation <sec:th-trans>

@ -225,6 +225,26 @@ The pseudo-code in @lst:renaming-algo shows the three steps of this algorithm:
 * #todo[interupting try blocks: catch block might expect temporary registers to still stored the saved value] ?
 */

+=== Implementation Details
+
+Most of the contributions we saw performing instrumentation rely on Soot.
+Soot works on an intermediate representation, Jimple, that is easier to manipulate.
+However, Soot can be cumbersome to set up and use, and we initially wanted better control over the modified bytecode.
+Our initial idea was to use Apktool, but in @sec:rasta, we found that many errors raised by tools were due to trying to parse Smali incorrectly.
+So, rather than parsing, modifying and regenerating the Smali text file, we decided to make our own instrumentation library from scratch.
+It was not as difficult as one would expect, thanks to the clear documentation of the Dalvik format from Google#footnote[https://source.android.com/docs/core/runtime/dex-format].
+In addition, when we had doubts about the specification, we had the option to check the implementation used by Apktool#footnote[https://github.com/JesusFreke/smali], or the code used by Android to check the integrity of the #DEX files#footnote[https://cs.android.com/android/platform/superproject/main/+/main:art/libdexfile/dex/dex_file_verifier.cc;drc=11bd0da6cfa3fa40bc61deae0ad1e6ba230b0954].
+
+One thing we noticed when manually instrumenting applications with Apktool is that sometimes the repackaged applications cannot be installed or run due to some files being stored incorrectly in the new application (#eg native library files must not be compressed).
+We also found that some applications deliberately store files with names that will crash the zip library used by Apktool.
+For this reason, we also used our own library to modify the #APK files.
+We take special care to process the least possible files in the #APKs, and only strip the #DEX files and signatures, before adding the new modified #DEX files at the end.
+
+Unfortunately, we did not have time to compare the robustness of our solution to existing tools like Apktool and Soot. 
+In hindsight, we probably should have taken the time to find a way to use smali/backsamli (the backend of Apktool) as a library or SootUp to do the instrumentation, but neither option has documentation to instrument applications.
+At the time of writing, the feature is still being developed, but in the future, Androguard might also become an option to modify #DEX files.
+Nevertheless, we published our instrumentation library, Androscalpel, for anyone who wants to use it. #todo[ref to code]
+
 #v(2em)

 Now that we saw the transformations we want to make, we know the runtime information we need to do it.
--- a/5_theseus/5_results.typ
+++ b/5_theseus/5_results.typ
@ -7,8 +7,6 @@

 == Results <sec:th-res>

-#todo[better section name for @sec:th-res]
-
 To study the impact of our transformation on analysis tools, we reused applications from the dataset we sampled in @sec:rasta/*-dataset*/.
 Because we are running the application on a recent version of Android (#SDK 34), we only took the most recent applications: the one collected in 2023.
 This represents #num(5000) applications over the #NBTOTALSTRING total of the initial dataset.
@ -35,12 +33,15 @@ In some cases, the application was just broken -- for instance, an application w
 In other cases, Frida is to blame: we found some cases where calling a method from Frida can confuse the #ART. 
 `protected` methods cannot be called from a class other than the one that defined the method or one of its children.
 The issue is that Frida might be considered by the #ART as another class, leading to the #ART aborting the application.
-#todo[jfl was suppose to test a few other app #emoji.eyes]
@tab:th-dyn-visited shows the number of applications that we analysed, if we managed to start at least one activity and if we intercepted code loading or reflection.
 It also shows the average number of activities visited (when at least one activity was started).
 This average is slightly higher than 1, which seems reasonable: a lot of applications do not need more than one activity, but some do, and we did manage to explore at least some of those additional activities.
 As shown in the table, even if the application fails to start an activity, sometimes it will still load external code or use reflection.

+We later tested the applications on a real phone (model Nothing (2a), Android 15), without Frida but still using GroddRunner.
+This time, we managed to visit at least one activity for #num(2130) applications, 3 times more than in our actual experiment.
+This shows that our setup is indeed breaking applications, but also that there is still another issue we did not find: more than half of the tested applications did not display any activities at all.
+
 #figure({
  let nb_col = 7
  table(
--- a/5_theseus/6_limits.typ
+++ b/5_theseus/6_limits.typ
@ -1,4 +1,4 @@
-#import "../lib.typ": paragraph, ART, DEX, APK, eg, SDK
+#import "../lib.typ": paragraph, ART, DEX, APK, eg, SDK, APKs
 #import "../lib.typ": todo, jfl-note, jm-note

 == Limitations and Future Works <sec:th-limits>
@ -70,10 +70,14 @@ In any case, statically, because we remove neither the calls to the function tha
  This could mean better exploration techniques, such as the one implemented by Stoat and GroddDroid, or more intrusive approaches, such as forced execution.
 ]

-=== Comparison with DroidRA
+=== Comparison with DroidRA and Other Tools

 It would be very interesting to compare our tool to DroidRA.
 DroidRA is a tool that computes reflection information using static analysis and patches the application to add those calls.
 Beyond the classic comparison of static versus dynamic, DroidRA has a similar goal and strategy to ours.
 Two notable comparison criteria would be the failure rate and the number of edges added to an application call graph.
 The first criterion indicates how much the results can be used by other tools, while the second indicates how effective the approaches are.
+
+Because we elected to make our own software to modify the bytecode of the #APKs, it would be insightful to compare the finishing rate and performances of simple transformations with our tool, to the same transformation made with Apktool, Soot or SootUp.
+An example of a transformation to test would be to log each method call and its return value.
+More than finding which solution is the best to instrument an application, this would allow us to compare the weaknesses of each tool and find if some recurring issues for some tools can be solved using a technical solution implemented by another tool (#eg some applications deliberately include files with names that crash the standard Java zip library).