This commit is contained in:
Jean-Marie 'Histausse' Mineau 2025-07-29 16:23:42 +02:00
parent 243b9df134
commit c060e88996
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
17 changed files with 264 additions and 96 deletions

View file

@ -57,10 +57,10 @@ In addition to decompilling #DEX files, Jadx can also decode Android manifests a
=== Soot <sec:bg-soot>
Soot#footnote[https://github.com/soot-oss/soot] @Arzt2013 is a Java optimization framework.
Soot#footnote[https://github.com/soot-oss/soot]~@Arzt2013 is a Java optimization framework.
It can leaft java bytecode to other intermediate representations that can be used to perform optimization then converted back to bytecode.
Because Dalvik bytecode and Java bytecode are equivalent, support for Android was added to Soot, and Soot features are now leveraged to analyse Android applications.
One of the best known example of Soot usage for Android analysis is Flowdroid@Arzt2014a, a tool that compute data flow in an application.
One of the best known example of Soot usage for Android analysis is Flowdroid~@Arzt2014a, a tool that compute data flow in an application.
A new version of Soot, SootUp#footnote[https://github.com/soot-oss/SootUp], is currently beeing worked on.
Compared to Soot, it has a modernize interface and architecture, but it is not yet feature complete and some tools like Flowdroid are still using Soot.

View file

@ -6,7 +6,7 @@
#todo[swap with tool section ?]
In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications.
Two main approaches can be distinguished: static and dynamic analysis@Li2017.
Two main approaches can be distinguished: static and dynamic analysis~@Li2017.
Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system.
For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task.
Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage.
@ -18,7 +18,7 @@ For malware, dynamic analysis is also limited by evading techniques that may pre
Static analysis program examine an #APK file without executing it to extract information from it.
Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code.
More advance analysis consist in the computing the control-flow of an application and computing its data-flow@Li2017.
More advance analysis consist in the computing the control-flow of an application and computing its data-flow~@Li2017.
The most basic form of control-flow analysis is to build a call graph.
A call graph is a graph where the nodes represent the methods in the application, and the edges reprensent calls from one method to another.
@ -121,7 +121,7 @@ This can be used to identify the user can cannot be changed if compromised.
This make `TelephonyManager.getImei()` a good candidate as a taint source.
On the other hand, `UrlRequest.start()` send a request to an external server, making it a taint sink.
If a data-flow is found linking `TelephonyManager.getImei()` to `UrlRequest.start()`, this means the application is potentially leaking a critical information to an external entity, a behavior that is probably not wanted by the user.
Data-flow analysis is the subject of many contribution@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable source being Flowdroid@Arzt2014a.
Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable source being Flowdroid~@Arzt2014a.
#todo[Describe the different contributions in relations to the issues they tackle]
@ -143,44 +143,31 @@ The most notable user of Soot is Flowdroid. #todo[formulation]
The alternative to static analysis is dynamic analysis.
With dynamic analysis, the application is actually executed.
The most simple strategies consist in just running the application and examining its behavior.
For instance, Shao #etal #todo[cit] capture the network communication of an application and analyse those traces, while Bhatia #etal #todo[cit] take #jm-note[periodic][meh] snapshots of the memory to deduce the beavior of the application #todo[check the papers].
The most simple strategies consist in just running the application and examining its behavior.
For instance, Bernardi #etal~@bernardi_dynamic_2019 use the log generated by `strace` to list the system calls generated in responce to an event to determine if an application is malicious.
More advanced methods are more intrusive and require modifing either the #APK, the Android framework, runtime, or kernel.
TaintDroid #todo[cit] for example modify the Dalvik Virtual Machine (the predecessor of the #ART) to track the data flow of an application at runtime, while AndroBlare #todo[cit] try to compute the taint flow by hooking system calls from a kernel module. #todo[check papers]
#todo[RealDroid?]
TaintDroid~@Enck2010 for example modify the Dalvik Virtual Machine (the predecessor of the #ART) to track the data flow of an application at runtime, while AndroBlare~@Andriatsimandefitra2012 @andriatsimandefitra_detection_2015 try to compute the taint flow by hooking system calls using a Linux Security Module.
DexHunter~@zhang2015dexhunter and AppSpear~@yang_appspear_2015 also patch the Dalvik Virtual Machine/#ART, this time to collect bytecode loaded dynamically.
Modifying the Android framwork, runtime or kernel is possible thanks to the Android project beeing opensource, however this is delicate operation.
Thus, a common issue faced by tools that took this approach is that they are stuck with a specific version of Android.
DroidScope@droidscope180237 and CopperDroid@Tam2015 are two well known sandbox faced with this issue. #todo[check, and add android version]
To limit this problem, other sandbox focus on hooking strategies, like DroidHook and Mirage #todo[cit, check paper], based on the Xposed framework, and CamoDroid #todo[cit and check], based on Frida.
Some sandboxes limit this issue by using dynamic binary instrumentation, like DroidHook~@cui_droidhook_2023, based the Xposed framework, or CamoDroid~@faghihi_camodroid_2022, based on Frida.
Another known challenge when analysing an application dynamically is the code coverage: if some part of the application is not executed, it cannot be annalysed.
Considering that Android applications are meant to interact with a user, this can become problematic for automatic analysis.
#todo[runner considered]
GroddDroid use static analysis to use static analysis to find suspicious code section and then use this information to guide a runner that uses the #todo[whatisnameagain?] framework to triger those suspicious section of code.
More challenging, some application will try to detect is they are in a sandbox environnement (#eg if they are in an emmulator, or if Frida is present in memory) and will refuse to run some sections of code if this is the case.
#todo[name] #etal @ruggia_unmasking_2024 make a list of evation techniques.
They show that most current analysis framework failled to hide themself correctly and introduce a new sandbox, DroidDungeon, that do avoid detection. #todo[limitation?]
The Monkey tool developed by Google is one of the most used solution~@sutter_dynamic_2024.
It sends a random streams of events the phone without tracking the state of the application.
More advance tools statically analyse the application to model in order to improve the exploration.
Sapienz~@mao_sapienz_2016 and Stoat~@su_guided_2017 uses this technique to improve application testing.
GroddDroid~@abraham_grodddroid_2015 has the same approach but detect statically suspicious sections of code to target, and will interact with the application to target those code section.
Unfortuntely, exploring the application entirely is not always possible, as some applications will try to detect is they are in a sandbox environnement (#eg if they are in an emmulator, or if Frida is present in memory) and will refuse to run some sections of code if this is the case.
Ruggia #etal~@ruggia_unmasking_2024 make a list of evasion techniques.
They propose a new sandbox, DroidDungeon, that contrary to other sandboxes like DroidScope@droidscope180237 or CopperDroid@Tam2015, strongly emphasizes on resiliance against evasion mechanism.
#todo[RealDroid sandbox bases on modified ART?]
#todo[force execution?]
// Shao et al. Yuru Shao, Jason Ott, Yunhan Jack Jia, Zhiyun Qian, and Z Morley Mao. The Misuse of Android Unix Domain Sockets and Security Implications. In: ACM SIGSAC Conference on Computer and Communications Security. Vienna, Austria: ACM, Oct. 2016, pp. 8091.
// Bhatia et al. Rohit Bhatia, Brendan Saltaformaggio, Seung Jei Yang, Aisha Ali-Gombe, Xiangyu Zhang, Dongyan Xu, and Golden G Richard III. "Tipped Off by Your Memory Allocator": Device-Wide User Activity Sequencing from Android Memory Images. In: (Feb. 2018).
- #todo[evasion: droid DroidDungeon @ruggia_unmasking_2024]
- #todo[Xposed: DroidHook / Mirage: Toward a stealthier and modular malware analysis sandbox for android]
- #todo[Frida: CamoDroid]
- #todo[
modified android framework, framework or kernel:
- RealDroid
- AndroBlare, taint analysis, linux module to hook syscalls, c'est maison
Radoniaina Andriatsimandefitra and Valérie Viet Triem Tong. Detection and identification of Android malware based on information flow monitoring. In: 2nd International Conference on Cyber Security and
Cloud Computing. New York, USA: IEEE, Jan. 2015, pp. 200203.
Radoniaina Andriatsimandefitra, Stéphane Geller, and Valérie Viet Triem Tong. Designing information flow policies for Androids operating system. In: IEEE International conference on communications.Ottawa, ON, Canada: IEEE, June 2012, pp. 976981.
- TaintDroid (check if dynamic? strange, cf Reaves et al) modifies the Dalvik Virtual Machine (DVM) interpreter to manage taint
]
=== Hybrid Analysis <sec:bg-hybrid>
#todo[merge with other section?]
- #todo[DyDroid, audit of Dynamic Code Loading@qu_dydroid_2017]
- #todo[DyDroid, audit of Dynamic Code Loading~@qu_dydroid_2017]

View file

@ -4,19 +4,19 @@
Computing if an application contains a possible information flow is an example of a static analysis goal.
Some datasets have been built especially for evaluating tools that are computing information flows inside Android applications.
One of the first well known dataset is DroidBench, that was released with the tool Flowdroid@Arzt2014a.
Later, the dataset ICC-Bench was introduced with the tool Amandroid@weiAmandroidPreciseGeneral2014 to complement DroidBench by introducing applications using Inter-Component data flows.
One of the first well known dataset is DroidBench, that was released with the tool Flowdroid~@Arzt2014a.
Later, the dataset ICC-Bench was introduced with the tool Amandroid~@weiAmandroidPreciseGeneral2014 to complement DroidBench by introducing applications using Inter-Component data flows.
These datasets contain carefully crafted applications containing flows that the tools should be able to detect.
These hand-crafted applications can also be used for testing purposes or to detect any regression when the software code evolves.
Contrary to real world applications, the behavior of these hand-crafted applications is known in advance, thus providing the ground truth that the tools try to compute.
However, these datasets are not representative of real-world applications@Pendlebury2018 and the obtained results can be misleading.
However, these datasets are not representative of real-world applications~@Pendlebury2018 and the obtained results can be misleading.
Contrary to DroidBench and ICC-Bench, some approaches use real-world applications.
Bosu #etal@bosuCollusiveDataLeak2017 use DIALDroid to perform a threat analysis of Inter-Application communication and published DIALDroid-Bench, an associated dataset.
Similarly, Luo #etal released TaintBench@luoTaintBenchAutomaticRealworld2022 a real-world dataset and the associated recommendations to build such a dataset.
Bosu #etal~@bosuCollusiveDataLeak2017 use DIALDroid to perform a threat analysis of Inter-Application communication and published DIALDroid-Bench, an associated dataset.
Similarly, Luo #etal released TaintBench~@luoTaintBenchAutomaticRealworld2022 a real-world dataset and the associated recommendations to build such a dataset.
These datasets are useful for carefully spotting missing taint flows, but contain only a few dozen of applications.
In addition to those datasets, Androzoo@allixAndroZooCollectingMillions2016 collect applications from several application market places, including the Google Play store (the official Google application store), Anzhi and AppChina (two chinese stores), or FDroid (a store dedicated to free and open source applications).
In addition to those datasets, Androzoo~@allixAndroZooCollectingMillions2016 collect applications from several application market places, including the Google Play store (the official Google application store), Anzhi and AppChina (two chinese stores), or FDroid (a store dedicated to free and open source applications).
Currently, Androzoo contains more than 25 millions applications, that can be downloaded by researchers from the SHA256 hash of the application.
Androzoo provide additionnal information about the applications, like the date the application was detected for the first time by Androzoo or the number of antivirus from VirusTotal that flaged the application as malicious.
In addition to providing researchers with an easy access to real world applications, Androzoo make it a lot easier to share datasets for reproducibility: instead of sharing hundreds of #APK files, the list of SHA256 is enough.