thesis/2_background/3_problem_statements.typ

#import "../lib.typ": pb1, pb1-text, pb2, pb2-text, pb3, pb3-text, ART
#import "../lib.typ": todo

== Problems of the Reverse Engineer <sec:bg-probl>

In this section, we will develop some issues encontered by reverse engineer, and link them to our problem statements.

In the previous section, we listed some limitations to static analysis.
Some limitations have been known for some time now, and many contributions have been made to been made to overcome them.
Those contribution often introduce new tools that implements solutions to those different issues.
Depending on the situation, a reverse engineer might want to use those tools, or build another tool on top of one.
Unfortunately, they can be hard to use.
And like we said previously, the fast evolution of Android can be a significant obstacle.
The combinaison of those two point can lead a reverse engineer to spend a lot of time trying to use a tool without realising that tools does not work anymore.
Our first problem statement #pb1 focuses on this issue: #pb1-text
Determining which tools are still usable today is a first step, but finding out what reasons make a tool stop working might help writing more resilient tools in the futur.

We also presented dynamic code loading an obstacle for static analysis.
Code loading is achieved using class loader objects, causing class loaders to be generally associated with dynamic code loading.
However, class loading plays a much more important role in the #ART.
Class loading originate from the Java ecosystem, and was ported to Android so that developers could keep writting application in Java.
Despit that, Android made a lot of change to the original Java classes, and did not document those changes.
Between static analysis general oversight of class loading, relegating it to dynamic analysis, and the lake of documentation of the actual behaviour of the #ART, the question of the impact of the class loading algorithm on static analysis can be ask.
Our second problem statement #pb2 aims to anwser this question: #pb2-text

Circling back to known limitations of static analysis, dynamic code loading and reflection are often used to obfuscate applications.
Dynamic code loading allows to hide bytecode from static analysis with relativelly low effort.
The bytecode can downloaded at runtime, stored in the application encrypted, hidden inside other files, generated at runtime, etc.
In a way, reflection allows to do the same thing, but for specific method calls: instead of the actual call, static analysis will see a call to the generic `Method.invoke()` method.
By contrast, it is relatively easy to find those the name of the method called or to intercept dynamically loaded bytecode using dynamic tools like Frida.
The issue that arrise then is what to do with the collected data.
Simply having it greatly helps a manual analysis, but it cannot be used directly by tools that perform static analyses.
There is no standard representation for runtime information, and there is simply no way to give a list of reflection sites and the associated method calls for most tools.
This means that in most cases, when a reverse engineer wants to improve static analysis with dynamic analysis, they need to modify the static tools to receive the additionnal runtime data.
Doing so requires both time and knowledge of the internals of the tools used.
Our third problem statement, #pb3, explore an alternative aproach that modify the application instead of the tool: #pb3-text

We will now explore the current state of the art for relevent contributions related to our problem statements.