This commit is contained in:
parent
19286eba61
commit
eb35d092ac
10 changed files with 170 additions and 43 deletions
|
@ -1,3 +1,28 @@
|
|||
#import "../lib.typ": todo
|
||||
#import "../lib.typ": API, ie, todo
|
||||
|
||||
#todo[Intro: on parle de reverse des app, donc on parle des apps, techniques de reverse, outils, dataset, et justifier de pourquoi on en parle]
|
||||
== Introduction
|
||||
|
||||
In order to understand the challenges of reverse engineering Android applications, we first need to understand some key concepts and specificities of Android.
|
||||
In particular, the format in wich application are distributed, as well as the runtime environment that runs those application, are very specific to Android.
|
||||
To handle those specificities, a reverse engineer must appropriate tools.
|
||||
Some of those tools are used recurrently, either by the reverse engineer themself, or as basis for other more complexe tools that implement more advance analysis techniques.
|
||||
|
||||
Among those techniques, the ones that do not require to run the application are called static analysis.
|
||||
Over the time, many of those tools have been released.
|
||||
To compare those different tools, different benchmarks have been proposed, highlighting different strenght and weeknesses of each tools.
|
||||
|
||||
Unfortunately static analysis has its limits.
|
||||
One such limit is that it cannot analysis what is not inside the application.
|
||||
Platform classes are classes that are present directly on the smartphone, and not in the application.
|
||||
Some of those classes are well known and taken into account by analysis tools, but the rest of those classes, often called _hidden #API;_, are not.
|
||||
In addition to platform classes, classes that are loaded dynamically (#ie at runtime) are also not always available to static analysis.
|
||||
This led static analysis tools to disregard the class loading process altogether, leaving the subject relativelly unexplored.
|
||||
|
||||
When static analysis fails, for instance because of dynamic class loading, the reverse engineer will fallback dynamic analysis.
|
||||
Dynamic analysis is the counterpart of static analysis: the analysis is based on the analysis of the excecution of the application.
|
||||
Depending on the context, the reverse engineer will then alternate between different techniques, using previous results to improve the next iteration.
|
||||
Regrettably, analysis tools mostly return results in an ad hoc format, making it difficult to make other tools aware of the retrieved information.
|
||||
Some tools however encode their result in the form of a new augmented Android application.
|
||||
The idea beeing that any Android analysis tools must be able to handle an Android application in the first place, so it will have access to those new information.
|
||||
|
||||
In this section, explore in more details those different aspects of Android reverse engineering.
|
||||
|
|
|
@ -16,7 +16,7 @@ Among the notable tools in the #SDK, they are:
|
|||
|
||||
- `emulator`: an Android emulator.
|
||||
This tools allow to run an emulated Android phone on a computer.
|
||||
Although very usefull, Android emulator has several limitation.
|
||||
Although very useful, Android emulator has several limitation.
|
||||
For once, it cannot emulate another achitecture.
|
||||
An x86_64 computer cannot emulate an ARM smartphone.
|
||||
This can be an issue because a majority of smartphone run on ARM processor.
|
||||
|
@ -55,7 +55,7 @@ In addition, it can perform additionnal analysis, like computing a call graph or
|
|||
|
||||
Jadx#footnote[https://github.com/skylot/jadx] is an application decompiler.
|
||||
It convert #DEX files to Java source code.
|
||||
It is not always capable of decompiling all classes of an application, so it cannot be used to recompile a new application, but the code generated can be verry helpfull to reverse an application.
|
||||
It is not always capable of decompiling all classes of an application, so it cannot be used to recompile a new application, but the code generated can be very helpful to reverse an application.
|
||||
In addition to decompilling #DEX files, Jadx can also decode Android manifests and application ressources.
|
||||
|
||||
=== Soot <sec:bg-soot>
|
||||
|
@ -83,7 +83,7 @@ Malware might implement countermeasures that avoid running malicious payload in
|
|||
|
||||
#v(2em)
|
||||
|
||||
Those tools are quite usefull for manual operations.
|
||||
Those tools are quite useful for manual operations.
|
||||
However, considering the complexity of modern Android applications, it might take a lot of work for a reverse engineer to analyse one application.
|
||||
In the next section, we will see more advance techniques that have been developped to analyse Android applications.
|
||||
|
||||
|
|
|
@ -130,7 +130,7 @@ Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGener
|
|||
|
||||
#todo[Describe the different contributions in relations to the issues they tackle, be more critical]
|
||||
|
||||
Static analysis is powerfull as it allows to detects unwanted behavior in an application even is the behavior does not manifest itself when running the application.
|
||||
Static analysis is powerful as it allows to detects unwanted behavior in an application even is the behavior does not manifest itself when running the application.
|
||||
Hovewer, static analysis tools must overcom many challenges when analysing Android applications:
|
||||
/ the Java object-oriented paradigm: A call to a method can in fact correspond to a call to any method overriding the original method in subclasses.
|
||||
/ the multiplicity of entry points: Each component of an application can be an entry point for the application.
|
||||
|
|
|
@ -77,7 +77,7 @@ luoTaintBenchAutomaticRealworld2022 (TaintBench):
|
|||
- provide a dataset framework for taint analysis on top of reprodroid
|
||||
- /!\ compare current and previously evaluated version of AmAndroid and Flowdroid:
|
||||
-> Up to date version of both tools are less accurate than predecessor <-
|
||||
- timeout 20min: AmAndroid 11 apps, unsuccessfull exits 9
|
||||
- timeout 20min: AmAndroid 11 apps, unsuccessful exits 9
|
||||
|
||||
pauckAndroidTaintAnalysis2018 (ReproDroid):
|
||||
- Introduce AQL (Android app analysis query language): standard langage to describe input
|
||||
|
|
|
@ -44,5 +44,3 @@ In the next section, we will explore further the contributions that take this ap
|
|||
|
||||
//#todo[RealDroid sandbox bases on modified ART?]
|
||||
//#todo[force execution?]
|
||||
|
||||
#todo[Positionnement]
|
||||
|
|
42
2_background/8_instrumentation.typ
Normal file
42
2_background/8_instrumentation.typ
Normal file
|
@ -0,0 +1,42 @@
|
|||
#import "../lib.typ": DEX, APK, ART, etal, eg, pb3, pb3-text, jm-note
|
||||
|
||||
== Improving Analysis with Instrumentation <sec:bg-instrumentation>
|
||||
|
||||
Usually, instrumentation refers to the practice of modifying the behavior of a program to collect information during its execution.
|
||||
Frida is a good example of instrumentation framework.
|
||||
The term can also be used more generally to describe operation that modify the application code.
|
||||
In this section, we will focus on the use of instrumentation that make an application easier to analyse by other tools, instead of just collecting additionnal information at runtime.
|
||||
|
||||
I the previous section, we gave the example of AppSpear~@yang_appspear_2015, that reconstruct #DEX files intercepted at runtime and repackage the #APK with the new code in it.
|
||||
DexLeog~@dexlego has a similar but a lot more aggressive technique.
|
||||
It targets heavily obfuscated packer that decrypt then reencrypt the methods instructions just in time.
|
||||
To get the bytecode, DexLego log each instruction executed by the #ART, and reconstruct the methods, then the #DEX files, from this stream of instructions.
|
||||
The main limitation of this technique is that it carrys over the limitation of dynamic analysis to static analysis: the bytecode injected in the application is limited to the instructions executed during the dynamic analysis.
|
||||
Nevertheless, it is an intersting way to encode the traces of a dynamic analysis in a way that can be used by any Android analysis tool.
|
||||
|
||||
IccTa~@liIccTADetectingInterComponent2015 technique is close to idea of modifying the application to improve its analysis: it perform a first analysis to compute the potential inter-component communication of an application, then modify the jimple representation of this application before feeding it to Flowdroid to perform a taint analysis.
|
||||
Jimple is the intermediate language used by Soot, so even if IccTa does not generate a new application, this modify representation can probably be used by any tool based on the Soot framework or recompilled into a new application without too much effort.
|
||||
Samhi #etal~@samhi_jucify_2022 followed this direction to unify the analysis of bytecode and native code.
|
||||
Their tool, JuCify, use Angr~@angrPeople to generate the call graph of the native code, and use euristics to encode this call graph into jimple that can then be added to the jimple generated by Soot from the bytecode of the application.
|
||||
Like IccTa, they use Flowdroid to analyse this new augmented representation of the application, but it should be usable by any analysis tools relying on Soot.
|
||||
|
||||
Finally, DroidRA~@li_droidra_2016 use the COAL~@octeauCompositeConstantPropagation2015 solver to statically compute the reflection informations.
|
||||
The reflection calls are transformed into direct calls inside the application using Soot.
|
||||
Using COAL makes DroidRA quite good to solve the simpler cases, where name of classes and methods targeted by reflection are already present in the application.
|
||||
Those cases are quite commons and beeing able to solve those without resorting to dynamic analysis is quite useful.
|
||||
On the other hand, COAL will struggle to solve cases with complexe string manipulation and is simply not able to handle cases that rely on external data (#eg downloaded from the internet at runtime).
|
||||
Likewise, this can only access code loaded dynamically if the code was present inside the application without any kind of obfuscation (#eg a #DEX file in the assets of the application can be analyse, but not if it is ciphered).
|
||||
|
||||
|
||||
#v(2em)
|
||||
|
||||
Instrumenting applications to encode the result of an analysis as an unified representation has been explored before.
|
||||
It has been used by tools like AppSpear and DexLego to expose heavily obfuscated bytecode collected dynamically.
|
||||
Similarly, DroidRA compute reflection information computed statically and inject the actual method calls inside the application it returns.
|
||||
However, AppSpear and DexLego focus primarely on specific obfuscation techniques, making there implementation difficult to port to more rescent version of Android, and DroidRA suffers the limitation of static analysis.
|
||||
We believe that instrumentation is a promising approach to encode those information.
|
||||
#jm-note(side: right)[Especially, we think that using it to provide information collected by even a simple dynamic analysis could be significantly beneficial for many tools.][Urf, this is over promising considering the work done in @sec:th]
|
||||
|
||||
#jm-note(side: left)[#pb3: #pb3-text][Yeah no, this need a revision]
|
||||
|
||||
|
22
2_background/9_conclusion.typ
Normal file
22
2_background/9_conclusion.typ
Normal file
|
@ -0,0 +1,22 @@
|
|||
#import "../lib.typ": APK, pb1, pb2, pb3, pb1-text, pb2-text, pb3-text
|
||||
|
||||
== Conclusion
|
||||
|
||||
In this chapter, looked at the specificities of Android and the usual tools used as a basis for reverse engeenering applications.
|
||||
Many contributions have been done to static analysis, and benchmarks have been proposed to compare the different tools that resulted from those contributions.
|
||||
Those benchmarks raised questions about the reusability of those tools and their capacity to handle real-world applications.
|
||||
We then looked at a platform classes and class loading, a commonly recognised limitation of static analysis.
|
||||
Because of that, the issue is generally relegated to dynamic analysis, leaving the details of the class loading mechanisms of Android unexplored.
|
||||
To complement static analysis we continued by looking at dynamic analysis.
|
||||
A variety of approaches have been proposed, balancing ease of use, maintanability and stealthyness.
|
||||
The result of those analysis are often in an ad hoc format, making it difficult to reuse with other tools.
|
||||
A few exception as well as some static analysis tools proposed an interesting solution to this issue:
|
||||
instrumenting the analyse application to encode the results of the analysis in the form of a valide #APK, a format any Android analysis tools should be able read.
|
||||
We liked this solution and believe it should be studied further.
|
||||
This process led us to our problem statements:
|
||||
|
||||
/ #pb1: #pb1-text
|
||||
/ #pb2: #pb2-text
|
||||
/ #pb3: #pb3-text
|
||||
|
||||
In the next chapters, we will endeavor to contribute to the Android reverse reverse engineering field by anwsering those problematics.
|
|
@ -12,35 +12,5 @@
|
|||
#include("5_platform_classes.typ")
|
||||
#include("6_classloading.typ")
|
||||
#include("7_dynamic_analysis.typ")
|
||||
|
||||
/*
|
||||
* Cours generique sur android
|
||||
* présenter apk tool, jadx, androguard et flowdroid
|
||||
* analyse statique
|
||||
* outils avec des datasets un peu trop gentils
|
||||
*
|
||||
* analyse dynamique
|
||||
*
|
||||
* process du reverseur
|
||||
*
|
||||
* Garder les détails du class loading et de la reflection pour les chapitres associés?
|
||||
*
|
||||
* Analyse dynamique
|
||||
*/
|
||||
|
||||
|
||||
#jfl-note[
|
||||
Le chapitre background est tres technique et descriptif: il dit "il y a tel ettel outil".
|
||||
A ce state et avant le chap 3 on aimerait lire:
|
||||
- Les objectifs globaux de la these
|
||||
- Ce que fait classiquement un reverser avec une app et quels sont ses pbs
|
||||
- Puis de l'état de l'art pour dire quels sont les contribs du passé qui ont tenté d'aider ce reverser
|
||||
|
||||
par ex, le reverser a envi de savoir si l'app fait fuiter des donées de géoloc.
|
||||
Dans ce cas, on peut utiliser taintdroid, pour calculer si c'est le cas statiquement, et parler des limites.
|
||||
Idem pour les contribs en analyse dyn.
|
||||
A la fin on aimerait avoir une idée plus claire des limites en ayant illustré avec différentes taches de reverse. Limites résumées:
|
||||
- Les outils crashent beaucoup
|
||||
- Le chargement dyn fait chier
|
||||
- Une appli dissequee ne peut pas etre analysé
|
||||
][todo]
|
||||
#include("8_instrumentation.typ")
|
||||
#include("9_conclusion.typ")
|
||||
|
|
|
@ -1,8 +1,15 @@
|
|||
#import "../lib.typ": todo
|
||||
#import "../lib.typ": epigraph, highlight-block, todo
|
||||
|
||||
= Theseus <sec:th>
|
||||
|
||||
#todo[theseus chapter title for @sec:th]
|
||||
#epigraph("Plutarch, Life of Theseus 23.1")[The ship wherein Theseus and the youth of Athens returned from Crete had thirty oars, and was preserved by the Athenians \[...\] for they took away the old planks as they decayed, putting in new and strong timber in their places]
|
||||
|
||||
#align(center, highlight-block(inset: 15pt, width: 75%, block(align(left)[
|
||||
#todo[Abstract for @sec:th]
|
||||
])))
|
||||
|
||||
|
||||
#todo[better title for theseus chapter title for @sec:th]
|
||||
|
||||
#include("1_static_transformation.typ")
|
||||
#include("2_dynamic_data_collection.typ")
|
||||
|
|
|
@ -1234,3 +1234,66 @@ month = aug
|
|||
file = {Full Text PDF:/home/histausse/Zotero/storage/I6H4B9IU/Mayrhofer et al. - 2021 - The Android Platform Security Model.pdf:application/pdf},
|
||||
}
|
||||
|
||||
@article{dexlego,
|
||||
title = {{DexLego}: {Reassembleable} bytecode extraction for aiding static analysis},
|
||||
doi = {10.1109/DSN.2018.00075},
|
||||
abstract = {The scale of Android applications in the market is growing rapidly. To efficiently detect the malicious behavior in these applications, an array of static analysis tools are proposed. However, static analysis tools suffer from code hiding techniques like packing, dynamic loading, self modifying, and reflection. In this paper, we thus present DexLego, a novel system that performs a reassembleable bytecode extraction for aiding static analysis tools to reveal the malicious behavior of Android applications. DexLego leverages just-in-time collection to extract data and bytecode from an application at runtime, and reassembles them to a new Dalvik Executable (DEX) file offline. The experiments on DroidBench and real-world applications show that DexLego correctly reconstructs the behavior of an application in the reassembled DEX file, and significantly improves analysis result of the existing static analysis systems.},
|
||||
journal = {Proceedings - 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018},
|
||||
author = {Ning, Zhenyu and Zhang, Fengwei},
|
||||
year = {2018},
|
||||
note = {Publisher: IEEE
|
||||
ISBN: 9781538655955},
|
||||
keywords = {★, Android, application analysis, dynamic analysis, self modifying code, static analysis, unpacking},
|
||||
pages = {690--701},
|
||||
file = {PDF:/home/histausse/Zotero/storage/2ZHQJGWG/Ning, Zhang - 2018 - DexLego Reassembleable bytecode extraction for aiding static analysis.pdf:application/pdf},
|
||||
}
|
||||
|
||||
@inproceedings{samhi_jucify_2022,
|
||||
address = {New York, NY, USA},
|
||||
series = {{ICSE} '22},
|
||||
title = {{JuCify}: a step towards {Android} code unification for enhanced static analysis},
|
||||
isbn = {978-1-4503-9221-1},
|
||||
shorttitle = {{JuCify}},
|
||||
url = {https://dl.acm.org/doi/10.1145/3510003.3512766},
|
||||
doi = {10.1145/3510003.3512766},
|
||||
abstract = {Native code is now commonplace within Android app packages where it co-exists and interacts with Dex bytecode through the Java Native Interface to deliver rich app functionalities. Yet, state-of-the-art static analysis approaches have mostly overlooked the presence of such native code, which, however, may implement some key sensitive, or even malicious, parts of the app behavior. This limitation of the state of the art is a severe threat to validity in a large range of static analyses that do not have a complete view of the executable code in apps. To address this issue, we propose a new advance in the ambitious research direction of building a unified model of all code in Android apps. The JuCify approach presented in this paper is a significant step towards such a model, where we extract and merge call graphs of native code and bytecode to make the final model readily-usable by a common Android analysis framework: in our implementation, JuCify builds on the Soot internal intermediate representation. We performed empirical investigations to highlight how, without the unified model, a significant amount of Java methods called from the native code are "unreachable" in apps' call-graphs, both in goodware and malware. Using JuCify, we were able to enable static analyzers to reveal cases where malware relied on native code to hide invocation of payment library code or of other sensitive code in the Android framework. Additionally, JuCify's model enables state-of-the-art tools to achieve better precision and recall in detecting data leaks through native code. Finally, we show that by using JuCify we can find sensitive data leaks that pass through native code.},
|
||||
urldate = {2023-03-27},
|
||||
booktitle = {Proceedings of the 44th {International} {Conference} on {Software} {Engineering}},
|
||||
publisher = {Association for Computing Machinery},
|
||||
author = {Samhi, Jordan and Gao, Jun and Daoudi, Nadia and Graux, Pierre and Hoyez, Henri and Sun, Xiaoyu and Allix, Kevin and Bissyandé, Tegawendé F. and Klein, Jacques},
|
||||
month = jul,
|
||||
year = {2022},
|
||||
pages = {1232--1244},
|
||||
file = {Samhi et al. - 2022 - JuCify a step towards Android code unification fo.pdf:/home/histausse/Zotero/storage/ML7EEFWX/Samhi et al. - 2022 - JuCify a step towards Android code unification fo.pdf:application/pdf},
|
||||
}
|
||||
|
||||
@INPROCEEDINGS{angrPeople,
|
||||
author={Shoshitaishvili, Yan and Wang, Ruoyu and Salls, Christopher and Stephens, Nick and Polino, Mario and Dutcher, Andrew and Grosen, John and Feng, Siji and Hauser, Christophe and Kruegel, Christopher and Vigna, Giovanni},
|
||||
booktitle={2016 IEEE Symposium on Security and Privacy (SP)},
|
||||
title={SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis},
|
||||
year={2016},
|
||||
volume={},
|
||||
number={},
|
||||
pages={138-157},
|
||||
keywords={Computer bugs;Semantics;Security;Binary codes;Engines;Operating systems;attacks and defenses;security architectures;system security},
|
||||
doi={10.1109/SP.2016.17}
|
||||
}
|
||||
|
||||
@inproceedings{li_droidra_2016,
|
||||
address = {New York, NY, USA},
|
||||
series = {{ISSTA} 2016},
|
||||
title = {{DroidRA}: taming reflection to support whole-program analysis of {Android} apps},
|
||||
isbn = {978-1-4503-4390-9},
|
||||
shorttitle = {{DroidRA}},
|
||||
url = {https://doi.org/10.1145/2931037.2931044},
|
||||
doi = {10.1145/2931037.2931044},
|
||||
abstract = {Android developers heavily use reflection in their apps for legitimate reasons, but also significantly for hiding malicious actions. Unfortunately, current state-of-the-art static analysis tools for Android are challenged by the presence of reflective calls which they usually ignore. Thus, the results of their security analysis, e.g., for private data leaks, are inconsistent given the measures taken by malware writers to elude static detection. We propose the DroidRA instrumentation-based approach to address this issue in a non-invasive way. With DroidRA, we reduce the resolution of reflective calls to a composite constant propagation problem. We leverage the COAL solver to infer the values of reflection targets and app, and we eventually instrument this app to include the corresponding traditional Java call for each reflective call. Our approach allows to boost an app so that it can be immediately analyzable, including by such static analyzers that were not reflection-aware. We evaluate DroidRA on benchmark apps as well as on real-world apps, and demonstrate that it can allow state-of-the-art tools to provide more sound and complete analysis results.},
|
||||
urldate = {2025-07-22},
|
||||
booktitle = {Proceedings of the 25th {International} {Symposium} on {Software} {Testing} and {Analysis}},
|
||||
publisher = {Association for Computing Machinery},
|
||||
author = {Li, Li and Bissyandé, Tegawendé F. and Octeau, Damien and Klein, Jacques},
|
||||
month = jul,
|
||||
year = {2016},
|
||||
pages = {318--329},
|
||||
file = {Submitted Version:/home/histausse/Zotero/storage/RPJ5UCTI/Li et al. - 2016 - DroidRA taming reflection to support whole-program analysis of Android apps.pdf:application/pdf},
|
||||
}
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue