diff --git a/5_theseus/3_static_transformation.typ b/5_theseus/3_static_transformation.typ index cc0812e..44ed06a 100644 --- a/5_theseus/3_static_transformation.typ +++ b/5_theseus/3_static_transformation.typ @@ -227,14 +227,38 @@ The pseudo-code in @lst:renaming-algo shows the three steps of this algorithm: === Implementation Details -Most of the contributions we saw performing instrumentation rely on Soot. +Most of the contributions we saw performing instrumentation in the state of the art rely on Soot. Soot works on an intermediate representation, Jimple, that is easier to manipulate. However, Soot can be cumbersome to set up and use, and we initially wanted better control over the modified bytecode. Our initial idea was to use Apktool, but in @sec:rasta, we found that many errors raised by tools were due to trying to parse Smali incorrectly. -So, rather than parsing, modifying and regenerating the Smali text files, we decided to make our own instrumentation library from scratch. +In addition, although it might be due to the fact that they performed more complex analysis, tools based on Soot showed a trend of consuming a lot of memory and failing with unclear errors, supporting us in our idea of avoiding Soot. +For these reasons, we decided to make our own instrumentation library from scratch. +That library requires being able to parse, modify and generate valid #DEX files. It was not as difficult as one would expect, thanks to the clear documentation of the Dalvik format from Google#footnote[https://source.android.com/docs/core/runtime/dex-format]. In addition, when we had doubts about the specification, we had the option to check the implementation used by Apktool#footnote[https://github.com/JesusFreke/smali], or the code used by Android to check the integrity of the #DEX files#footnote[https://cs.android.com/android/platform/superproject/main/+/main:art/libdexfile/dex/dex_file_verifier.cc;drc=11bd0da6cfa3fa40bc61deae0ad1e6ba230b0954]. +We chose to use Rust to implement this library. +It has both good performance and ergonomics. +For instance, we could parallelise the parsing and generation of #DEX files without much effort. +Because we are not using a high-level intermediate language like Jimple (used by Soot), the management of registers has to be done manually (by the user of the library), the same way it has to be done when using Apktool. +This poses a few challenges. + +A method declares a number of internal registers it will use (let's call this number $n$), and has access to an additional number of registers used to store the parameters (let's call this number $p$). +Each register is referred to by a number from $0$ to $65535$. +The internal registers are numbered from $0$ to $n$, and the parameter registers from $n$ to $n+p$. +This means that when adding new registers to the method when instrumenting it (let's say we want to add $k$ registers, the new registers will be numbered from $n$ to $n+k$, and the parameter registers will be renumbered from $[|n, n+p[|$ to $[|n+k, n+k+p[|$. +In general, this is not an issue, but some instructions can only operate on some registers (#eg `array-length`, which stores the length of an array in a register, only works on registers numbered between $0$ and $8$ excluded). +This means that adding registers to a method can be enough to break a method. +We solved this by adding instructions that move the content of registers $[|n+k, n+k+p[|$ to the registers $[|n, n+p[|$, and keeping the original register numbers ($[|n, n+p[|$) for the parameters in the rest of the body of the method. + +The next challenge arises when we need to use one of the new registers with an instruction that only accepts registers lower than $n+p$. +In such cases, a lower register must be used, and its content will be temporarily saved in one of the new registers. +This is not as easy as it seems: the Dalvik instructions differ depending on whether the register stores a reference or a scalar value, and Android does check that the register types match the instructions. +The type of the register can be computed from the control flow graph of the method (we added the computation of such a graph, with the type of each register, as a feature in our library). +An edge case that must not be overlooked is that each instruction inside a `try` block is branching to each of the `catch` blocks. +This is a problem: it prevents us from restoring the registers to their original values before entering the `catch` blocks (or, if we restore the values at the beginning of the `catch` blocks and an exception is raised before the value is saved, the register will be overwritten by an invalid value). +This means that when modifying the content of a `try` block, the block must be split into several blocks to prevent impromptu branching. + One thing we noticed when manually instrumenting applications with Apktool is that sometimes the repackaged applications cannot be installed or run due to some files being stored incorrectly in the new application (#eg native library files must not be compressed). We also found that some applications deliberately store files with names that will crash the zip library used by Apktool. For this reason, we also used our own library to modify the #APK files.