thesis/2_background/3_problem_statements.typ

#import "../lib.typ": pb1, pb1-text, pb2, pb2-text, pb3, pb3-text, ART
#import "../lib.typ": todo

== Problems of the Reverse Engineer <sec:bg-probl>

In this section, we will develop on some issues encountered by reverse engineers, and link them to our problem statements.

In the previous section, we listed some limitations to static analysis.
Some limitations have been known for some time now, and many contributions have been made to overcome them.
Those contributions often introduce new tools that implement solutions to those different issues.
Depending on the situation, a reverse engineer might want to use those tools or build another tool on top of one.
Unfortunately, they can be hard to use.
And like we said previously, the fast evolution of Android can be a significant obstacle.
The combination of those two points can lead a reverse engineer to spend a lot of time trying to use a tool without realising that the tool does not work anymore.
Our first problem statement #pb1 focuses on this issue: #pb1-text
Determining which tools are still usable today is a first step, but finding out what reasons make a tool stop working might help write more resilient tools in the future.

We also presented dynamic code loading as an obstacle for static analysis.
Code loading is achieved using class loader objects, causing class loaders to be generally associated with dynamic code loading.
However, class loading plays a much more important role in the #ART.
Class loading originates from the Java ecosystem and was ported to Android so that developers could keep writing applications in Java.
Despite that, Android made a lot of changes to the original Java classes and did not document those changes.
Between static analysis, general oversight of class loading, relegating it to dynamic analysis, and the lake of documentation of the actual behaviour of the #ART, the question of the impact of the class loading algorithm on static analysis can be asked.
Our second problem statement #pb2 aims to anwser this question: #pb2-text

Circling back to known limitations of static analysis, dynamic code loading and reflection are often used to obfuscate applications.
Dynamic code loading allows hiding bytecode from static analysis with relatively low effort.
The bytecode can be downloaded at runtime, stored in the application encrypted, hidden inside other files, generated at runtime, etc.
In a way, reflection can do the same thing, but for specific method calls: instead of the actual call, static analysis will see a call to the generic `Method.invoke()` method.
By contrast, it is relatively easy to find the name of the method called or to intercept dynamically loaded bytecode using dynamic tools like Frida.
The issue that arises then is what to do with the collected data.
Simply having it greatly helps a manual analysis, but it cannot be used directly by tools that perform static analyses.
There is no standard representation for runtime information, and there is simply no way to give a list of reflection sites and the associated method calls as a new input for most static analysis tools.
This means that in most cases, when a reverse engineer wants to improve static analysis with dynamic analysis, they need to modify the static tools to receive the additional runtime data.
Doing so requires both time and knowledge of the internals of the tools used.
Our third problem statement, #pb3, explores an alternative approach that modifies the application instead of the tool: #pb3-text

We will now explore the current state of the art for relevant contributions related to our problem statements.