wip
Some checks failed
/ test_checkout (push) Failing after 1s

This commit is contained in:
Jean-Marie Mineau 2025-09-01 18:57:48 +02:00
parent 98cf4fbf6a
commit ba7130160e
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2

View file

@ -1,11 +1,5 @@
#import "../lib.typ": todo, APK, DEX, JAR, OAT, eg, ART, paragraph, jm-note, jfl-note
/*
* Parler de dex lego et du papier qui encode les resultats d'anger en jimple
* argggg https://dl.acm.org/doi/10.1145/2931037.2931044 is verrryyyyy close
*
*/
== Code Transformation <sec:th-trans>
#todo[Define code loading and reflection somewhere]
@ -16,10 +10,11 @@ In this section, we will see how we can transform the application code to make d
=== Transforming Reflection <sec:th-trans-ref>
In Android, reflection can be used to do two things: instanciate a class, or call a method.
Either way, reflection starts by retrieving the `Class` object representing the class to use.
This class is usually retrieved using a `ClassLoader` object, but can also be retrieved directly from the classloader of the class defining the calling method.
// elaborate? const-class dalvik instruction / MyClass.class in java?
In Android, reflection allows to instanciate a class, or call a method, without having this class or method appear in the bytecode.
Instead, the bytecode uses the generic classes `Class`, `Method` and `Constructor`, that represent any existing class, method or constructor.
Reflection often starts by retrieving the `Class` object representing the class to use.
This class is usually retrieved using a `ClassLoader` object (though they are other ways to get it).
Once the class is retrieved, it can be instanciated using the deprecated method `Class.newInstance()`, as shown in @lst:-th-expl-cl-new-instance, or a specific method can be retrieved.
The current approach to instanciate a class is to retrieve the specific `Constructor` object, then calling `Constructor.newInstance(..)` like in @lst:-th-expl-cl-cnstr.
Similarly, to call a method, the `Method` object must be retrieved, then called using `Method.invoke(..)`, as shown in @lst:-th-expl-cl-call.
@ -53,18 +48,17 @@ When instanciating an object with `Object obj = cst.newInstance("Hello Void")`,
caption: [Calling a method using reflection]
) <lst:-th-expl-cl-call>
To allow static analysis tools to analyse an application that use reflection, we want to replace the reflection call by the bytecode that actually call the method.
One of the main reasons to use reflection is to access classes that are not present in the application bytecode, nor are platform classes.
Indeed, the application will crash if the #ART encounter references to a class that is cannot be found by the current classloader.
This is often the case when dealing with classes from bytecode loaded dynamically.
#jfl-note[
One of the main reason to use reflection is to access classes not from the application.
Although allows the use classes that do not exist in the application in bytecode, at runtime, if the classes are not found in the current classloader, the application will crash.
Similarly, some analysis tools might have trouble analysis application calling non existing classes.
@sec:th-trans-cl deals with the issue of adding dynamically loaded bytecode to the application.
][#underline[pas clair]]
To allow static analysis tools to analyse an application that use reflection, we want to replace the reflection call by the bytecode that actually call the method.
In @sec:th-trans-cl, we deal with the issue of dynamic code loading so that the classes used are in fact present in the application.
A notable issue is that a specific reflection call can call different methods.
@lst:th-worst-case-ref illustrates a worst case scenario where any method can be called at the same reflection call.
In those situation, #jfl-note[we cannot garanty that we know all the methods][expliquer (on va collecter les noms en best efforts?) Expliquer ce qu'on veut dire "acceder a une classe qui n'est pas dans l'APK, si l'appli crash a quoi ca sert?] that can be called (#eg the name of the method called could be retrieved from a remote server).
In those situation, we cannot garanty that we know all the methods that can be called (#eg the name of the method called could be retrieved from a remote server).
In addition, the method we propose in @sec:th-dyn is a best effort approach to collect reflection data: like any dynamic analysis, it is limited by its code coverage.
#figure(
```java
@ -76,13 +70,22 @@ In those situation, #jfl-note[we cannot garanty that we know all the methods][ex
) <lst:th-worst-case-ref>
To handle those situation, instead of entirely removing the reflection call, we can modify the application code to test if the `Method` (or `Constructor`) object match any expected method, and if yes, directly call the method.
If the object does not match any expected method, #jfl-note[the code can fallback to the original reflection call.][comme DroidRA? \ hheuuu, a verifier]
#jfl-note[@lst:-th-expl-cl-call-trans demonstrate this transformation on @lst:-th-expl-cl-call.][Expliquer @lst:-th-expl-cl-call-trans ligne importante]
If the object does not match any expected method, the code can fallback to the original reflection call.
DroidRA~@li_droidra_2016 has a similar solution, except that reflective calls are always evaluated, and the static equivalent follow just after, guarded behind an opaque predicate that is always false at runtime.
@lst:-th-expl-cl-call-trans demonstrate this transformation on @lst:-th-expl-cl-call:
at line 25, the `Method` objet `mth` is checked using a method we generated and injected in the application (defined at line 2 in the listing).
This method check if the method name, (line 5), its parameters (lines 6-9), its return type (lines 10-11) and its declaring class (lines 13-14) match the expected method.
If it is the case, the method is used directly (line 26) after casting the arguments and associated object into the types/classes we just checked.
If the check line 25 does not pass, the original reflectif call is made (line 28).
If we were to expect other possible methods to be called in addition to `myMethod`, we would add `else if` blocks between lines 26 and 27, with other check methods reflecting each potential method call.
/*
#jfl-note[It should be noted that we do the transformation at the bytecode level, the code in the listing correspond to the output of JADX][
J'aurais bien fait une section a part sur "comment on fait ces transformation concretement;
plus pedagique de décrire les transformation sans bytecode, ensuite, sous section qui discute
les facon de modifier le bytecode, soot, apktool, ect et qui explique les limites, puis dire comment tu fait mes modifications
] #todo[Ref to list of common tools?] reformated for readability.
*/
The method check is done in a separate method injected inside the application to avoid clutering the application too much.
Because Java (and thus Android) uses polymorphic methods, we cannot just check the method name and its class, but also the whole method signature.
We chose to limit the transformation to the specific instruction that call `Method.invoke(..)`.
@ -134,12 +137,12 @@ In those cases, the parameters could be used directly whithout the detour inside
objRet = mth.invoke(obj, args);
}
String retData = (String) objRet;
``` + todo[Ajouter lignes],
```,
caption: [@lst:-th-expl-cl-call after the de-reflection transformation]
) <lst:-th-expl-cl-call-trans>
=== Transforming Code Loading <sec:th-trans-cl>
=== Transforming Code Loading (or Not) <sec:th-trans-cl>
#jfl-note[Ici je pensais lire comment on tranforme le code qui load du code, mais on me parle de multi dex]
@ -148,8 +151,13 @@ Because it is an internal, platform dependant format, we elected to ignore the #
Practically, #JAR and #APK files are zip files containing #DEX files.
This means that we only need to find a way to integrate #DEX files to the application.
We elected to simply add the dex files to the application, using the multi-dex feature introduced by the SDK 21 now used by all applications as shown in @fig:th-inserting-dex. #jfl-note[aleady discussed in @sec:cl]
This gives access to the dynamically loaded code to static analysis tool.
We saw in @sec:cl the class loading model of Android.
When doing dynamic code loading, an application define a new `ClassLoader` that handle the new bytecode, and start accessing its classes using reflection.
We also saw in @sec:cl that Android now use the multi-dex format, allowing it to handle any number of #DEX files in one classloader.
Therefore, the simpler way to give access to the dynamically loaded code to static analysis tool is add the dex files to the application.
This should not impact the classloading model as long as there is no class collision (we will explore this in @sec:th-class-collision) and as long as the original application appliaction did not try to access unaccessible classes.
#jm-note[explain? maybe ref to section limitation]
#figure(
image(
@ -160,16 +168,18 @@ This gives access to the dynamically loaded code to static analysis tool.
caption: [Inserting #DEX files inside an #APK]
) <fig:th-inserting-dex>
We decided to leave untouched the original code that load the bytecode.
In the end, we decided to *not* modify the original code that load the bytecode.
Statically, we already added the bytecode loaded dynamically, and most tools already ignore dynamic code loading.
At runtime, although the bytecode is already present in the application, the application will still dynamically load the code.
This ensure that the application keep working as intended even if the transformation we applied are incomplete.
Specifically, to call dynamically loaded code, an application needs to use reflection, and we saw in @sec:th-trans-ref that we need to keep reflecton calls, and in order to keep reflection calls, we need the classloader created when loading bytecode.
Specifically, to call dynamically loaded code, an application needs to use reflection, and we saw in @sec:th-trans-ref that we need to keep reflection calls, and in order to keep reflection calls, we need the classloader created when loading bytecode.
=== Class Collisions <sec:th-class-collision>
We saw in @sec:cl/*-obfuscation*/ that having several classes with the same name in the same application can be problematic.
In @sec:th-trans-cl, we are adding new code.
#jfl-note[By doing so, we increase the probability of having class collisions.][Un mini exemple de collision serait utilse: on a du mal a comprendre d'ou vient la collision car c'est nous qui ajoutons des classes]
By doing so, we increase the probability of having class collisions:
The developper may have reuse a helper class in both the dynamically loaded bytecode and the application, or an obfuscation process may have rename classes without checking for intersection between the two sources of bytecode.
When loaded dynamically, the classes are in a different classloader, and the class resolution is resolved at runtime like we saw in @sec:cl-loading.
We decided to restrain our scope to the use of class loader from the Android SDK.
In the abscence of class collision, those class loader behave seamlessly and adding the classes to application maintains the behavior.