pass chapter 5

2025-09-30 03:05:07 +02:00 · 2025-09-30 03:05:07 +02:00 · d7df45b206
commit d7df45b206
parent f309dd55b8
8 changed files with 64 additions and 56 deletions
--- a/5_theseus/3_static_transformation.typ
+++ b/5_theseus/3_static_transformation.typ
@ -6,7 +6,6 @@ In this section, we will see how we can transform the application code to make d

 === Transforming Reflection <sec:th-trans-ref>

-
 In Android, reflection allows applications to instantiate a class or call a method without having this class or method appear in the bytecode.
 Instead, the bytecode uses the generic classes `Class`, `Method` and `Constructor`, which represent any existing class, method or constructor.
 Reflection often starts by retrieving the `Class` object representing the class to use.
@ -17,6 +16,7 @@ Similarly, to call a method, the `Method` object must be retrieved, then called

 Although the process seems to differ between class instantiation and method call from the Java standpoint, the runtime operations are very similar.
 When instantiating an object with `Object obj = cst.newInstance("Hello Void")`, the constructor method `<init>(Ljava/lang/String;)V`, represented by the `Constructor` `cst`, is called on the object `obj`.
+Thus, even for instantiation, a method is called at some point.

 #figure(
  ```java
@ -44,11 +44,10 @@ When instantiating an object with `Object obj = cst.newInstance("Hello Void")`,
  caption: [Calling a method using reflection]
 ) <lst:-th-expl-cl-call>

-One of the main reasons to use reflection is to access classes that are not present in the application bytecode, nor are platform classes.
-Indeed, the application will crash if the #ART encounters references to a class that cannot be found by the current class loader.
-This is often the case when dealing with classes from bytecode loaded dynamically.
+One of the main reasons to use reflection is to access classes that are neither platform classes nor in the application bytecode, as is often the case when dealing with classes from dynamically loaded bytecode.
+Indeed, if the #ART were to encounter an instruction referencing a class that cannot be loaded by the current class loaded, it would crash the application.

-To allow static analysis tools to analyse an application that uses reflection, we want to replace the reflection call with the bytecode that actually calls the method.
+To allow static analysis tools to analyse an application that uses reflection, we want to replace the reflection call with a bytecode chunk that actually calls the method and can be analysed by any static analysis tool.
 In @sec:th-trans-cl, we deal with the issue of dynamic code loading so that the classes used are, in fact, present in the application.

 A notable issue is that a specific reflection call can call different methods.
@ -65,11 +64,12 @@ In addition, the method we propose in @sec:th-dyn is a best effort approach to c
  caption: [A reflection call that can call any method]
 ) <lst:th-worst-case-ref>

-To handle those situations, instead of entirely removing the reflection call, we can modify the application code to test if the `Method` (or `Constructor`) object matches any expected method, and if yes, directly call the method.
+To handle those situations, instead of entirely removing the reflection call, we can modify the application code to test if the `Method` (or `Constructor`) object matches any of the methods observed dynamically, and if so, directly call the method.
 If the object does not match any expected method, the code can fall back to the original reflection call.
 DroidRA~@li_droidra_2016 has a similar solution, except that reflective calls are always evaluated, and the static equivalent follows just after, guarded behind an opaque predicate that is always false at runtime.
-@lst:-th-expl-cl-call-trans demonstrate this transformation on @lst:-th-expl-cl-call:
-at line 25, the `Method` object `mth` is checked using a method we generated and injected in the application (defined at line 2 in the listing).
+@lst:-th-expl-cl-call-trans demonstrates this transformation for the code originally in @lst:-th-expl-cl-call.
+Let's suppose that we observed dynamically a call to a method `Reflectee.myMethod(String)` at line 3 when monitoring the execution of the code of @lst:-th-expl-cl-call.
+In @lst:-th-expl-cl-call, at line 25, the `Method` object `mth` is checked using a method we generated and injected in the application (defined at line 2 in the listing).
 This method checks if the method name (line 5), its parameters (lines 6-9), its return type (lines 10-11) and its declaring class (lines 13-14) match the expected method.
 If it is the case, the method is used directly (line 26) after casting the arguments and associated object into the types/classes we just checked.
 If the check line 25 does not pass, the original reflective call is made (line 28).
@ -82,12 +82,12 @@ If we were to expect other possible methods to be called in addition to `myMetho
 ] #todo[Ref to list of common tools?] reformated for readability.
 */

-The method check is done in a separate method injected inside the application to avoid cluttering the application too much.
+The check of the `Method` value is done in a separate method injected inside the application to avoid cluttering the application too much.
 Because Java (and thus Android) uses polymorphic methods, we cannot just check the method name and its class, but also the whole method signature.
 We chose to limit the transformation to the specific instruction that calls `Method.invoke(..)`.
 This drastically reduces the risks of breaking the application, but leads to a lot of type casting.
 Indeed, the reflection call uses the generic `Object` class, but actual methods usually use specific classes (#eg `String`, `Context`, `Reflectee`) or scalar types (#eg `int`, `long`, `boolean`).
-This means that the method parameters and object on which the method is called must be downcast to their actual type before calling the method, then the returned value must be upcast back to an `Object`.
+This means that the method parameters and object on which the method is called must be downcasted to their actual type before calling the method, then the returned value must be upcasted back to an `Object`.
 Scalar types especially require special attention. 
 Java (and Android) distinguish between scalar types and classes, and they cannot be mixed: a scalar cannot be cast into an `Object`.
 However, each scalar type has an associated class that can be used when doing reflection.
@ -137,7 +137,6 @@ In those cases, the parameters could be used directly without the detour inside
  caption: [@lst:-th-expl-cl-call after the de-reflection transformation]
 ) <lst:-th-expl-cl-call-trans>

-
 === Transforming Code Loading (or Not) <sec:th-trans-cl>

 #jfl-note[Ici je pensais lire comment on tranforme le code qui load du code, mais on me parle de multi dex]
@ -150,7 +149,7 @@ This means that we only need to find a way to integrate #DEX files into the appl
 We saw in @sec:cl the class loading model of Android.
 When doing dynamic code loading, an application defines a new `ClassLoader` that handles the new bytecode, and starts accessing its classes using reflection.
 We also saw in @sec:cl that Android now use the multi-dex format, allowing it to handle any number of #DEX files in one class loader.
-Therefore, the simpler way to give access to the dynamically loaded code to static analysis tools is to add the dex files to the application.
+Therefore, the simpler way to give access to the dynamically loaded code to static analysis tools is to add the dex files in the application as additional multi-dex bytecode files.
 This should not impact the class loading model as long as there is no class collision (we will explore this in @sec:th-class-collision) and as long as the original application did not try to access inaccessible classes (we will develop this issue in @sec:th-limits). 

 #figure(
@ -162,12 +161,15 @@ This should not impact the class loading model as long as there is no class coll
  caption: [Inserting #DEX files inside an #APK]
 ) <fig:th-inserting-dex>

-In the end, we decided to *not* modify the original code that loads the bytecode.
-We already added the bytecode loaded dynamically, and most tools already ignore dynamic code loading.
+In the end, we decided *not* to modify the original code that loads the bytecode.
+Most tools already ignore dynamic code loading, and, with the dynamically loaded bytecode added using the multi-dex format, they already have access to it.
 At runtime, although the bytecode is already present in the application, the application will still dynamically load the code.
 This ensures that the application keeps working as intended, even if the transformation we applied is incomplete.
 Specifically, to call dynamically loaded code, an application needs to use reflection, and we saw in @sec:th-trans-ref that we need to keep reflection calls, and in order to keep reflection calls, we need the class loader created when loading bytecode.

+To summarise, we do not modify the existing bytecode.
+Instead, we add the intercepted bytecode to the application as additional #DEX files using the multi-dex format, as represented in @fig:th-inserting-dex.
+
 === Class Collisions <sec:th-class-collision>

 We saw in @sec:cl/*-obfuscation*/ that having several classes with the same name in the same application can be problematic.
@ -177,10 +179,11 @@ The developer may have reused a helper class in both the dynamically loaded byte
 When loaded dynamically, the classes are in a different class loader, and the class resolution is resolved at runtime, like we saw in @sec:cl-loading.
 We decided to restrain our scope to the use of class loaders from the Android #SDK.
 In the absence of class collision, those class loaders behave seamlessly and adding the classes to the application maintains the behaviour.
+#jfl-note[Un example aiderait a comprendre \ jm: j'en ai pas qui prennent pas 3 pages de listing]

 When we detect a collision, we rename one of the colliding classes in order to be able to differentiate between classes.
 To avoid breaking the application, we then need to rename all references to this specific class and be careful not to modify references to the other class.
-To do so, we regroup each class by the class loaders defining them.
+To do so, we regroup each class by the class loaders that define them.
 Then, for each colliding class name and each class loader, we check the actual class used by the class loader.
 If the class has been renamed, we rename all references to this class in the classes defined by this class loader.
 To find the class used by a class loader, we reproduce the behaviour of the different class loaders of the Android #SDK.