From 6a43784496ce2e9ecc8a7ae75328a92b37450bfa Mon Sep 17 00:00:00 2001 From: Jean-Marie 'Histausse' Mineau Date: Fri, 3 Oct 2025 04:36:10 +0200 Subject: [PATCH] finished, maybe, yes? --- 5_theseus/3_static_transformation.typ | 6 +++--- 6_conclusion/2_futur.typ | 27 +++++++++++---------------- X_appendices/released_software.typ | 27 ++++++++++++++++++++++++--- 3 files changed, 38 insertions(+), 22 deletions(-) diff --git a/5_theseus/3_static_transformation.typ b/5_theseus/3_static_transformation.typ index 9c204eb..70ed379 100644 --- a/5_theseus/3_static_transformation.typ +++ b/5_theseus/3_static_transformation.typ @@ -231,17 +231,17 @@ Most of the contributions we saw performing instrumentation rely on Soot. Soot works on an intermediate representation, Jimple, that is easier to manipulate. However, Soot can be cumbersome to set up and use, and we initially wanted better control over the modified bytecode. Our initial idea was to use Apktool, but in @sec:rasta, we found that many errors raised by tools were due to trying to parse Smali incorrectly. -So, rather than parsing, modifying and regenerating the Smali text file, we decided to make our own instrumentation library from scratch. +So, rather than parsing, modifying and regenerating the Smali text files, we decided to make our own instrumentation library from scratch. It was not as difficult as one would expect, thanks to the clear documentation of the Dalvik format from Google#footnote[https://source.android.com/docs/core/runtime/dex-format]. In addition, when we had doubts about the specification, we had the option to check the implementation used by Apktool#footnote[https://github.com/JesusFreke/smali], or the code used by Android to check the integrity of the #DEX files#footnote[https://cs.android.com/android/platform/superproject/main/+/main:art/libdexfile/dex/dex_file_verifier.cc;drc=11bd0da6cfa3fa40bc61deae0ad1e6ba230b0954]. One thing we noticed when manually instrumenting applications with Apktool is that sometimes the repackaged applications cannot be installed or run due to some files being stored incorrectly in the new application (#eg native library files must not be compressed). We also found that some applications deliberately store files with names that will crash the zip library used by Apktool. For this reason, we also used our own library to modify the #APK files. -We take special care to process the least possible files in the #APKs, and only strip the #DEX files and signatures, before adding the new modified #DEX files at the end. +We took special care to process the least possible files in the #APKs, and only strip the #DEX files and signatures, before adding the new modified #DEX files at the end. Unfortunately, we did not have time to compare the robustness of our solution to existing tools like Apktool and Soot. -In hindsight, we probably should have taken the time to find a way to use smali/backsamli (the backend of Apktool) as a library or SootUp to do the instrumentation, but neither option has documentation to instrument applications. +In hindsight, we probably should have taken the time to find a way to use smali/backsmali (the backend of Apktool) as a library or use SootUp to do the instrumentation, but neither option has documentation to instrument applications this way. At the time of writing, the feature is still being developed, but in the future, Androguard might also become an option to modify #DEX files. Nevertheless, we published our instrumentation library, Androscalpel, for anyone who wants to use it. #todo[ref to code] diff --git a/6_conclusion/2_futur.typ b/6_conclusion/2_futur.typ index 9e14732..d64e5ac 100644 --- a/6_conclusion/2_futur.typ +++ b/6_conclusion/2_futur.typ @@ -1,14 +1,10 @@ -#import "../lib.typ": todo, AOSP +#import "../lib.typ": todo, AOSP, eg == Perspectives for Future Work -#todo[ - Intro - In this section, we will discuss avenues of work raised by this thesis ? - The work presented in this thesis revealed avenues to improve ??. The following section will present those new avenues. -] +In this section, we present what, in light of this thesis, we believe to be worthwhile avenues of work to improve the Android reverse engineering ecosystem. -The main issue that appeared in all our work is an engineering one. +The main issue that appeared in all our work appears to be engineering one. The error we analysed in @sec:rasta showed that even something that should be basic, reading the content of an application, can be challenging. @sec:cl also showed that reproducing the exact behaviour of Android is more difficult than it seems. As long as those issues are not solved, we cannot build robust analysis tools. @@ -19,16 +15,15 @@ Dynamic analysis relying on patched versions of the #AOSP showed that it is diff Doing this would require limiting the modifications to the actual source code of Android to lower the changes needed at each update of Android. Another obstacle to overcome is to decouple the compilation of the tool from the rest of #AOSP: it is a massive dependence that needs a lot of resources to build. Having such a dependency would be a barrier to entry, preventing others from modifying or improving the tool. +Should those issues be solved, directly using the code from #AOSP would allow such a tool to keep up with each new version of Android and limit invalid assumptions about Android behaviour. - -#todo[ - Ideas: - - Standard Lib to interact with dalvik (dev by google?), with *STABLE* API and *ROBUST*: Today there is Apktool, Soot and Androguard - Apktool don't have a documented API, and by default do a lot of things that might work or not - Soot defaults are baaaadddd, maybe the new version? - Androguard is not bad, but not write capabilities (yet, it's a wip, maybe one day?) - Robust default, close to Android: the java zip parser is often targeted, there is something to be done here -] +An orthogonal solution to this problem is to create a new benchmark. +Benchmarks are usually targeted at some specific technique (#eg taint tracking), and accordingly, test for issues specific to the targeted technique (#eg accurately tracking data that passes through an array). +This one should test the capacity of a tool to handle real-life applications. +We suggest using a similar method to what we did in @sec:rasta to keep the benchmark independent from the tested tools. +Instead of checking the correctness of the tools, this benchmark should test if the tool is able to finish its analysis. +Applications in this benchmark could either be real-life applications that proved difficult to analyse (for instance, applications that crashed many of the tested tools in @sec:rasta), or hand-crafted applications reproducing corner cases or anti-reverse techniques encountered while analysing obfuscated applications (for instance, an application with gibberish binary file names inside `META-INF/` that can crash Jadx zip reader). +The main challenge with such a benchmark is that it would need frequent updates to follow Android evolutions, and be diverse enough to encompass a large spectrum of possible issues. #todo[web-base? flutter? wasm?] diff --git a/X_appendices/released_software.typ b/X_appendices/released_software.typ index 02e34c4..8b53843 100644 --- a/X_appendices/released_software.typ +++ b/X_appendices/released_software.typ @@ -1,6 +1,6 @@ -#import "../lib.typ": etal +#import "../lib.typ": etal, SDKs -= Released Software += Released Software and Artifacts In @sec:rasta, we mentioned that we had some difficulties finding some software listed by Li #etal following the disappearance of the original websites hosting it. To limit the risk of having the same issue, we hosted the different pieces of software we released for this thesis in several locations. @@ -10,12 +10,13 @@ This appendix lists the software we released as well as the different places the The code used in @sec:rasta is available at those locations: -- The author's personal git: https://git.mineau.eu/these-android-re/rasta - The research team Gitlab: https://gitlab.inria.fr/pirat/android/rasta +- The author's personal git: https://git.mineau.eu/these-android-re/rasta - Github: https://github.com/histausse/rasta - Zenodo: https://doi.org/10.5281/zenodo.10137904 The exact version of the code used in @sec:rasta is tagged as `icsr2024` in the git repositories and corresponds to the one stored in Zenodo. +The results of our experiment are also available in the Zenodo archive. The container images used to run the different tools are available on Zenodo at https://doi.org/10.5281/zenodo.10980349 as Singularity images, and on Dockerhub under the names: @@ -38,3 +39,23 @@ The container images used to run the different tools are available on Zenodo at - #link("https://hub.docker.com/r/histausse/rasta-redexer")[`histausse/rasta-redexer:icsr2024`] - #link("https://hub.docker.com/r/histausse/rasta-saaf")[`histausse/rasta-saaf:icsr2024`] - #link("https://hub.docker.com/r/histausse/rasta-wognsen")[`histausse/rasta-wognsen:icsr2024`] + +== Shadow Attack Survey Dataset + +The list of applications we scanned in @sec:cl, as well as the lists of platform classes, fields and, methods we extracted from the emulators for Android #SDKs 32, 33, and 34, are stored on Zenodo at https://doi.org/10.5281/zenodo.15846481. + +== Theseus + +The scripts we used for dynamic analysis and the code implementing the transformations described in @sec:th are available at the following locations: + +- https://gitlab.inria.fr/pirat/android/android-of-theseus +- https://git.mineau.eu/these-android-re/android_of_theseus +- https://github.com/histausse/android_of_theseus + +The application transformations rely on Androscalpel, the crate we developed to manipulate Dalvik bytecode. +Androscalpel can be found at the following locations: + +- https://gitlab.inria.fr/pirat/android/androscalpel +- https://git.mineau.eu/these-android-re/androscalpel +- https://github.com/histausse/androscalpel +