thesis/2_background/8_instrumentation.typ

#import "../lib.typ": DEX, APK, ART, etal, eg, pb3, pb3-text, jm-note

== Improving Analysis with Instrumentation <sec:bg-instrumentation>

Usually, instrumentation refers to the practice of modifying the behavior of a program to collect information during its execution.
Frida is a good example of instrumentation framework.
The term can also be used more generally to describe operation that modify the application code.
In this section, we will focus on the use of instrumentation that make an application easier to analyse by other tools, instead of just collecting additionnal information at runtime.

I the previous section, we gave the example of AppSpear~@yang_appspear_2015, that reconstruct #DEX files intercepted at runtime and repackage the #APK with the new code in it.
DexLeog~@dexlego has a similar but a lot more aggressive technique.
It targets heavily obfuscated packer that decrypt then reencrypt the methods instructions just in time.
To get the bytecode, DexLego log each instruction executed by the #ART, and reconstruct the methods, then the #DEX files, from this stream of instructions.
The main limitation of this technique is that it carrys over the limitation of dynamic analysis to static analysis: the bytecode injected in the application is limited to the instructions executed during the dynamic analysis.
Nevertheless, it is an intersting way to encode the traces of a dynamic analysis in a way that can be used by any Android analysis tool.

IccTa~@liIccTADetectingInterComponent2015 technique is close to idea of modifying the application to improve its analysis: it perform a first analysis to compute the potential inter-component communication of an application, then modify the jimple representation of this application before feeding it to Flowdroid to perform a taint analysis.
Jimple is the intermediate language used by Soot, so even if IccTa does not generate a new application, this modify representation can probably be used by any tool based on the Soot framework or recompilled into a new application without too much effort.
Samhi #etal~@samhi_jucify_2022 followed this direction to unify the analysis of bytecode and native code.
Their tool, JuCify, use Angr~@angrPeople to generate the call graph of the native code, and use euristics to encode this call graph into jimple that can then be added to the jimple generated by Soot from the bytecode of the application.
Like IccTa, they use Flowdroid to analyse this new augmented representation of the application, but it should be usable by any analysis tools relying on Soot.

Finally, DroidRA~@li_droidra_2016 use the COAL~@octeauCompositeConstantPropagation2015 solver to statically compute the reflection informations.
The reflection calls are transformed into direct calls inside the application using Soot.
Using COAL makes DroidRA quite good to solve the simpler cases, where name of classes and methods targeted by reflection are already present in the application.
Those cases are quite commons and beeing able to solve those without resorting to dynamic analysis is quite useful.
On the other hand, COAL will struggle to solve cases with complexe string manipulation and is simply not able to handle cases that rely on external data (#eg downloaded from the internet at runtime).
Likewise, this can only access code loaded dynamically if the code was present inside the application without any kind of obfuscation (#eg a #DEX file in the assets of the application can be analyse, but not if it is ciphered).


#v(2em)

Instrumenting applications to encode the result of an analysis as an unified representation has been explored before.
It has been used by tools like AppSpear and DexLego to expose heavily obfuscated bytecode collected dynamically.
Similarly, DroidRA compute reflection information computed statically and inject the actual method calls inside the application it returns.
However, AppSpear and DexLego focus primarely on specific obfuscation techniques, making there implementation difficult to port to more rescent version of Android, and DroidRA suffers the limitation of static analysis.
We believe that instrumentation is a promising approach to encode those information.
#jm-note(side: right)[Especially, we think that using it to provide information collected by even a simple dynamic analysis could be significantly beneficial for many tools.][Urf, this is over promising considering the work done in @sec:th]

#jm-note(side: left)[#pb3: #pb3-text][Yeah no, this need a revision]