38 lines
3.5 KiB
Typst
38 lines
3.5 KiB
Typst
#import "../lib.typ": pb1, pb1-text, pb2, pb2-text, pb3, pb3-text, ART
|
|
#import "../lib.typ": todo
|
|
|
|
== Problems of the Reverse Engineer <sec:bg-probl>
|
|
|
|
In this section, we will develop on some issues encountered by reverse engineers, and link them to our problem statements.
|
|
|
|
In the previous section, we listed some limitations to static analysis.
|
|
Some limitations have been known for some time now, and many contributions have been made to overcome them.
|
|
Those contributions often introduce new tools that implement solutions to those different issues.
|
|
Depending on the situation, a reverse engineer might want to use those tools or build another tool on top of one.
|
|
Unfortunately, they can be hard to use.
|
|
And like we said previously, the fast evolution of Android can be a significant obstacle.
|
|
The combination of those two points can lead a reverse engineer to spend a lot of time trying to use a tool without realising that the tool does not work anymore.
|
|
Our first problem statement #pb1 focuses on this issue: #pb1-text
|
|
Determining which tools are still usable today is a first step, but finding out what reasons make a tool stop working might help write more resilient tools in the future.
|
|
|
|
We also presented dynamic code loading as an obstacle for static analysis.
|
|
Code loading is achieved using class loader objects, causing class loaders to be generally associated with dynamic code loading.
|
|
However, class loading plays a much more important role in the #ART.
|
|
Class loading originates from the Java ecosystem and was ported to Android so that developers could keep writing applications in Java.
|
|
Despite that, Android made a lot of changes to the original Java classes and did not document those changes.
|
|
Between static analysis, general oversight of class loading, relegating it to dynamic analysis, and the lake of documentation of the actual behaviour of the #ART, the question of the impact of the class loading algorithm on static analysis can be asked.
|
|
Our second problem statement #pb2 aims to anwser this question: #pb2-text
|
|
|
|
Circling back to known limitations of static analysis, dynamic code loading and reflection are often used to obfuscate applications.
|
|
Dynamic code loading allows hiding bytecode from static analysis with relatively low effort.
|
|
The bytecode can be downloaded at runtime, stored in the application encrypted, hidden inside other files, generated at runtime, etc.
|
|
In a way, reflection can do the same thing, but for specific method calls: instead of the actual call, static analysis will see a call to the generic `Method.invoke()` method.
|
|
By contrast, it is relatively easy to find the name of the method called or to intercept dynamically loaded bytecode using dynamic tools like Frida.
|
|
The issue that arises then is what to do with the collected data.
|
|
Simply having it greatly helps a manual analysis, but it cannot be used directly by tools that perform static analyses.
|
|
There is no standard representation for runtime information, and there is simply no way to give a list of reflection sites and the associated method calls as a new input for most static analysis tools.
|
|
This means that in most cases, when a reverse engineer wants to improve static analysis with dynamic analysis, they need to modify the static tools to receive the additional runtime data.
|
|
Doing so requires both time and knowledge of the internals of the tools used.
|
|
Our third problem statement, #pb3, explores an alternative approach that modifies the application instead of the tool: #pb3-text
|
|
|
|
We will now explore the current state of the art for relevant contributions related to our problem statements.
|