integrate bg of rasta in bg section
All checks were successful
/ test_checkout (push) Successful in 1m15s
All checks were successful
/ test_checkout (push) Successful in 1m15s
This commit is contained in:
parent
94d26973d3
commit
5e512b585a
11 changed files with 170 additions and 107 deletions
|
@ -2,7 +2,7 @@
|
|||
|
||||
== Android <sec:bg-android>
|
||||
|
||||
Android is the smartphone operating system develloped by Google.
|
||||
Android is the smartphone operating system developed by Google.
|
||||
It is based on a Long Term Support Linux Kernel, to which are added patches develloped by the Android community.
|
||||
On top of the kernel, Android redeveloped many of the usual components used by linux-based operating systems, and added new ones.
|
||||
Those change make Android a verry unique operating system.
|
||||
|
|
|
@ -1,19 +1,21 @@
|
|||
#import "../lib.typ": todo, APK, etal, ART, SDK, eg, jm-note, jfl-note
|
||||
#import "../lib.typ": APK, etal, ART, SDK, DEX, eg,
|
||||
#import "../lib.typ": todo, jm-note, jfl-note
|
||||
#import "@preview/diagraph:0.3.3": raw-render
|
||||
|
||||
== Android Reverse Engineering Techniques <sec:bg-techniques>
|
||||
//== Android Reverse Engineering Techniques <sec:bg-techniques>
|
||||
|
||||
//#todo[swap with tool section ?]
|
||||
|
||||
|
||||
== Static Analysis <sec:bg-static>
|
||||
|
||||
In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications.
|
||||
Two main approaches can be distinguished: static and dynamic analysis~@Li2017.
|
||||
Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system.
|
||||
For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task.
|
||||
Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage.
|
||||
For malware, dynamic analysis is also limited by evading techniques that may prevent the execution of malicious parts of the code.
|
||||
//As a consequence, a lot of efforts have been put in static approaches, which is the focus of this paper.
|
||||
|
||||
=== Static Analysis <sec:bg-static>
|
||||
Dynamic analysis is also limited by evading techniques that may prevent the execution of malicious parts of the code.
|
||||
As a consequence, a lot of efforts have been put in static approaches. //, which is the focus of this paper.
|
||||
|
||||
Static analysis program examine an #APK file without executing it to extract information from it.
|
||||
Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code.
|
||||
|
@ -123,7 +125,7 @@ On the other hand, `UrlRequest.start()` send a request to an external server, ma
|
|||
If a data-flow is found linking `TelephonyManager.getImei()` to `UrlRequest.start()`, this means the application is potentially leaking a critical information to an external entity, a behavior that is probably not wanted by the user.
|
||||
Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable tool being Flowdroid~@Arzt2014a.
|
||||
|
||||
#todo[Describe the different contributions in relations to the issues they tackle]
|
||||
#todo[Describe the different contributions in relations to the issues they tackle, be more critical]
|
||||
|
||||
Static analysis is powerfull as it allows to detects unwanted behavior in an application even is the behavior does not manifest itself when running the application.
|
||||
Hovewer, static analysis tools must overcom many challenges when analysing Android applications:
|
||||
|
@ -137,36 +139,13 @@ Hovewer, static analysis tools must overcom many challenges when analysing Andro
|
|||
For instance, the multi-dex feature presented in @sec:bg-android-code-format was introduced in Android #SDK 21.
|
||||
Tools unaware of this feature only analyse the `classes.dex` file an will ignore all other `classes<n>.dex` files.
|
||||
|
||||
#jfl-note[The tools can share the backend used to interact with the bytecode.
|
||||
For example, Apktool is often called in a subprocess to extracte the bytecode, and the Soot framework is a commonly used both to analyse bytecode and modify it.
|
||||
The most notable user of Soot is Flowdroid. #todo[formulation]][mettre ca a avant]
|
||||
A lot of those more advanced tools rely on common tools to interact with Android applications/#DEX bytecode@~@Li2017.
|
||||
Reccuring examples of such support tools are Appktool (#eg Amandroid~@weiAmandroidPreciseGeneral2014, Blueseal~@shenInformationFlowsPermission2014, SAAF~@hoffmannSlicingDroidsProgram2013), Androguard (#eg Adagio~@gasconStructuralDetectionAndroid2013, Appareciumn~@titzeAppareciumRevealingData2015, Mallodroid~@fahlWhyEveMallory2012) or Soot (#eg Blueseal~@shenInformationFlowsPermission2014, DroidSafe~@DBLPconfndssGordonKPGNR15, Flowdroid~@Arzt2014a).
|
||||
|
||||
=== Dynamic Analysis <sec:bg-dynamic>
|
||||
|
||||
The alternative to static analysis is dynamic analysis.
|
||||
With dynamic analysis, the application is actually executed.
|
||||
The most simple strategies consist in just running the application and examining its behavior.
|
||||
For instance, Bernardi #etal~@bernardi_dynamic_2019 use the log generated by `strace` to list the system calls generated in responce to an event to determine if an application is malicious.
|
||||
|
||||
More advanced methods are more intrusive and require modifing either the #APK, the Android framework, runtime, or kernel.
|
||||
TaintDroid~@Enck2010 for example modify the Dalvik Virtual Machine (the predecessor of the #ART) to track the data flow of an application at runtime, while AndroBlare~@Andriatsimandefitra2012 @andriatsimandefitra_detection_2015 try to compute the taint flow by hooking system calls using a Linux Security Module.
|
||||
DexHunter~@zhang2015dexhunter and AppSpear~@yang_appspear_2015 also patch the Dalvik Virtual Machine/#ART, this time to collect bytecode loaded dynamically.
|
||||
Modifying the Android framwork, runtime or kernel is possible thanks to the Android project beeing opensource, however this is delicate operation.
|
||||
Thus, a common issue faced by tools that took this approach is that they are stuck with a specific version of Android.
|
||||
Some sandboxes limit this issue by using dynamic binary instrumentation, like DroidHook~@cui_droidhook_2023, based the Xposed framework, or CamoDroid~@faghihi_camodroid_2022, based on Frida.
|
||||
|
||||
Another known challenge when analysing an application dynamically is the code coverage: if some part of the application is not executed, it cannot be annalysed.
|
||||
Considering that Android applications are meant to interact with a user, this can become problematic for automatic analysis.
|
||||
The Monkey tool developed by Google is one of the most used solution~@sutter_dynamic_2024.
|
||||
It sends a random streams of events the phone without tracking the state of the application.
|
||||
More advance tools statically analyse the application to model in order to improve the exploration.
|
||||
Sapienz~@mao_sapienz_2016 and Stoat~@su_guided_2017 uses this technique to improve application testing.
|
||||
GroddDroid~@abraham_grodddroid_2015 has the same approach but detect statically suspicious sections of code to target, and will interact with the application to target those code section.
|
||||
|
||||
Unfortuntely, exploring the application entirely is not always possible, as some applications will try to detect is they are in a sandbox environnement (#eg if they are in an emmulator, or if Frida is present in memory) and will refuse to run some sections of code if this is the case.
|
||||
Ruggia #etal~@ruggia_unmasking_2024 make a list of evasion techniques.
|
||||
They propose a new sandbox, DroidDungeon, that contrary to other sandboxes like DroidScope@droidscope180237 or CopperDroid@Tam2015, strongly emphasizes on resiliance against evasion mechanism.
|
||||
|
||||
#todo[RealDroid sandbox bases on modified ART?]
|
||||
#todo[force execution?]
|
||||
#todo[DyDroid, audit of Dynamic Code Loading~@qu_dydroid_2017]
|
||||
The number of publication related to static analysis make can make it difficult to find the right tool for the right task.
|
||||
Li #etal~@Li2017 published a systematic literature review for Android static analysis before May 2015.
|
||||
They analyzed 92 publications and classified them by goal, method used to solve the problem and underlying technical solution for handling the bytecode when performing the static analysis.
|
||||
In particular, they listed 27 approaches with an open-source implementation available.
|
||||
Nevertheless, experiments to evaluate the reusability of the pointed out software were not performed.
|
||||
#jfl-note[We believe that the effort of reviewing the literature for making a comprehensive overview of available approaches should be pushed further: an existing published approach with a software that cannot be used for technical reasons endanger both the reproducibility and reusability of research.][A mettre en avant?]
|
||||
In the next section, we will look at the work that has been done to evaluate different analysis tools.
|
|
@ -1,23 +0,0 @@
|
|||
#import "../lib.typ": todo, etal, APK
|
||||
|
||||
== Application Datasets <sec:bg-datasets>
|
||||
|
||||
Computing if an application contains a possible information flow is an example of a static analysis goal.
|
||||
Some datasets have been built especially for evaluating tools that are computing information flows inside Android applications.
|
||||
One of the first well known dataset is DroidBench, that was released with the tool Flowdroid~@Arzt2014a.
|
||||
Later, the dataset ICC-Bench was introduced with the tool Amandroid~@weiAmandroidPreciseGeneral2014 to complement DroidBench by introducing applications using Inter-Component data flows.
|
||||
These datasets contain carefully crafted applications containing flows that the tools should be able to detect.
|
||||
These hand-crafted applications can also be used for testing purposes or to detect any regression when the software code evolves.
|
||||
Contrary to real world applications, the behavior of these hand-crafted applications is known in advance, thus providing the ground truth that the tools try to compute.
|
||||
However, these datasets are not representative of real-world applications~@Pendlebury2018 and the obtained results can be misleading.
|
||||
|
||||
Contrary to DroidBench and ICC-Bench, some approaches use real-world applications.
|
||||
Bosu #etal~@bosuCollusiveDataLeak2017 use DIALDroid to perform a threat analysis of Inter-Application communication and published DIALDroid-Bench, an associated dataset.
|
||||
Similarly, Luo #etal released TaintBench~@luoTaintBenchAutomaticRealworld2022 a real-world dataset and the associated recommendations to build such a dataset.
|
||||
These datasets are useful for carefully spotting missing taint flows, but contain only a few dozen of applications.
|
||||
|
||||
In addition to those datasets, Androzoo~@allixAndroZooCollectingMillions2016 collect applications from several application market places, including the Google Play store (the official Google application store), Anzhi and AppChina (two chinese stores), or FDroid (a store dedicated to free and open source applications).
|
||||
Currently, Androzoo contains more than 25 millions applications, that can be downloaded by researchers from the SHA256 hash of the application.
|
||||
Androzoo provide additionnal information about the applications, like the date the application was detected for the first time by Androzoo or the number of antivirus from VirusTotal that flaged the application as malicious.
|
||||
In addition to providing researchers with an easy access to real world applications, Androzoo make it a lot easier to share datasets for reproducibility: instead of sharing hundreds of #APK files, the list of SHA256 is enough.
|
||||
|
|
@ -1,37 +1,51 @@
|
|||
#import "../lib.typ": etal, eg, ie, jfl-note
|
||||
#import "X_var.typ": *
|
||||
#import "../lib.typ": etal, eg, ie, jfl-note, jm-note
|
||||
// #import "X_var.typ": *
|
||||
|
||||
== Related Work <sec:rasta-soa>
|
||||
#import "../lib.typ": todo, etal, APK
|
||||
|
||||
// Research contributions often rely on existing datasets or provide new ones in order to evaluate the developed software.
|
||||
// Raw datasets such as Drebin@Arp2014 contain few information about the provided applications.
|
||||
// As a consequence, dataset suites have been developed to provide, in addition to the applications, meta information about the expected results.
|
||||
// For example, taint analysis datasets should provide the source and expected sink of a taint.
|
||||
// In some cases, the datasets are provided with additional software for automatizing part of the analysis.
|
||||
// Thus,
|
||||
#jfl-note[We review in this section the past existing contributions related to static analysis tools reusability.][lier a chap 2]
|
||||
|
||||
Several papers have reviewed Android analysis tools produced by researchers.
|
||||
Li #etal~@Li2017 published a systematic literature review for Android static analysis before May 2015.
|
||||
They analyzed 92 publications and classified them by goal, method used to solve the problem and underlying technical solution for handling the bytecode when performing the static analysis.
|
||||
In particular, they listed 27 approaches with an open-source implementation available.
|
||||
Nevertheless, experiments to evaluate the reusability of the pointed out software were not performed.
|
||||
#jfl-note[We believe that the effort of reviewing the literature for making a comprehensive overview of available approaches should be pushed further: an existing published approach with a software that cannot be used for technical reasons endanger both the reproducibility and reusability of research.][A mettre avant?]
|
||||
== Evaluating Static Analysis Tools <sec:bg-eval-tools>
|
||||
|
||||
Works that perform benchmaks of tools follow a similar method.
|
||||
They select a set of tools with similar goals.
|
||||
They start by selecting a set of tools with similar goals.
|
||||
Usually, those contribusions are comparing existing tools to their own, but some contributions do not introduce a new tool and focus on surveying the state of the art for some technique.
|
||||
As we saw in @sec:bg-datasets, the need for a ground truth to compare the results of the tools leads to test datasets to often be handcrafted.
|
||||
Some studdy will select a few real-world applications instead, a manually reverse engineer those application to get the ground truth.
|
||||
They then selected a dataset of application to analyse.
|
||||
We will see in @sec:bg-datasets that those dataset are often and crafted, even if some studdies select a few read-world application that they manually reverse engineer to get a ground truth to compare to the tools result.
|
||||
Once the tools and test dataset are selected, the tools are run on the application dataset, and the results of the tools are compared to the ground truth to determine the accuracy of each tools.
|
||||
Several factors are the considered to compare the results of the tools.
|
||||
It can be the number of false positive, false negative, or even the time it took to finish the analysis.
|
||||
Several factors can be considered to compare the results of the tools:
|
||||
the number of false positives, false negatives, or even the time it took to finish the analysis.
|
||||
Occasionally, the number of application a tool simply failled to analyse are also compared.
|
||||
|
||||
In @sec:bg-datasets we will look at the dataset used in the community to compare analysis tools, and in @sec:rasta-soa we will go through the contributions that benchmarked those tools #jm-note[to see if they can be used as an indication as to which tools can still be used today.] [Mettre en avant]
|
||||
|
||||
=== Application Datasets <sec:bg-datasets>
|
||||
|
||||
Research contributions often rely on existing datasets or provide new ones in order to evaluate the developed software.
|
||||
Raw datasets such as Drebin@Arp2014 contain few information about the provided applications.
|
||||
As a consequence, dataset suites have been developed to provide, in addition to the applications, meta information about the expected results.
|
||||
For example, taint analysis datasets should provide the source and expected sink of a taint.
|
||||
In some cases, the datasets are provided with additional software for automatizing part of the analysis.
|
||||
One such dataset is DroidBench, that was released with the tool Flowdroid~@Arzt2014a.
|
||||
Later, the dataset ICC-Bench was introduced with the tool Amandroid~@weiAmandroidPreciseGeneral2014 to complement DroidBench by introducing applications using Inter-Component data flows.
|
||||
These datasets contain carefully crafted applications containing flows that the tools should be able to detect.
|
||||
These hand-crafted applications can also be used for testing purposes or to detect any regression when the software code evolves.
|
||||
The drawback to using hand-crafted applications is that these datasets are not representative of real-world applications~@Pendlebury2018 and the obtained results can be misleading.
|
||||
|
||||
Contrary to DroidBench and ICC-Bench, some approaches use real-world applications.
|
||||
Bosu #etal~@bosuCollusiveDataLeak2017 use DIALDroid to perform a threat analysis of Inter-Application communication and published DIALDroid-Bench, an associated dataset.
|
||||
Similarly, Luo #etal released TaintBench~@luoTaintBenchAutomaticRealworld2022 a real-world dataset and the associated recommendations to build such a dataset.
|
||||
These datasets are useful for carefully spotting missing taint flows, but contain only a few dozen of applications.
|
||||
|
||||
In addition to those datasets, Androzoo~@allixAndroZooCollectingMillions2016 collect applications from several application market places, including the Google Play store (the official Google application store), Anzhi and AppChina (two chinese stores), or FDroid (a store dedicated to free and open source applications).
|
||||
Currently, Androzoo contains more than 25 millions applications, that can be downloaded by researchers from the SHA256 hash of the application.
|
||||
Androzoo also provide additionnal information about the applications, like the date the application was detected for the first time by Androzoo or the number of antivirus from VirusTotal that flaged the application as malicious.
|
||||
In addition to providing researchers with an easy access to real world applications, Androzoo make it a lot easier to share datasets for reproducibility: instead of sharing hundreds of #APK files, the list of SHA256 is enough.
|
||||
|
||||
=== Benchmarking <sec:rasta-soa>
|
||||
|
||||
The few datasets composed of real-world application confirmed that some tools such as Amandroid~@weiAmandroidPreciseGeneral2014 and Flowdroid~@Arzt2014a are less efficient on real-world applications~@bosuCollusiveDataLeak2017 @luoTaintBenchAutomaticRealworld2022.
|
||||
Unfortunatly, those real-world applications datasets are rather small, and a larger number of applications would be more suitable for our goal, #ie evaluating the reusability of a variety of static analysis tools.
|
||||
|
||||
Pauck #etal~@pauckAndroidTaintAnalysis2018 used DroidBench~@Arzt2014a, ICC-Bench~@weiAmandroidPreciseGeneral2014 and DIALDroid-Bench~@bosuCollusiveDataLeak2017 to compare Amandroid~@weiAmandroidPreciseGeneral2014, DIAL-Droid~@bosuCollusiveDataLeak2017, DidFail~@klieberAndroidTaintFlow2014, DroidSafe~@DBLPconfndssGordonKPGNR15, FlowDroid~@Arzt2014a and IccTA~@liIccTADetectingInterComponent2015 -- all these tools will be also compared in this chapter.
|
||||
Pauck #etal~@pauckAndroidTaintAnalysis2018 used DroidBench~@Arzt2014a, ICC-Bench~@weiAmandroidPreciseGeneral2014 and DIALDroid-Bench~@bosuCollusiveDataLeak2017 to compare Amandroid~@weiAmandroidPreciseGeneral2014, DIAL-Droid~@bosuCollusiveDataLeak2017, DidFail~@klieberAndroidTaintFlow2014, DroidSafe~@DBLPconfndssGordonKPGNR15, FlowDroid~@Arzt2014a and IccTA~@liIccTADetectingInterComponent2015. //-- all these tools will be also compared in this chapter.
|
||||
To perform their comparison, they introduced the AQL (Android App Analysis Query Language) format.
|
||||
AQL can be used as a common language to describe the computed taint flow as well as the expected result for the datasets.
|
||||
It is interesting to notice that all the tested tools timed out at least once on real-world applications, and that Amandroid~@weiAmandroidPreciseGeneral2014, DidFail~@klieberAndroidTaintFlow2014, DroidSafe~@DBLPconfndssGordonKPGNR15, IccTA~@liIccTADetectingInterComponent2015 and ApkCombiner~@liApkCombinerCombiningMultiple2015 (a tool used to combine applications) all failed to run on applications built for Android API 26.
|
||||
|
@ -47,15 +61,15 @@ The auditors reported that most of the tools require a significant amount of tim
|
|||
Reaves #etal propose to solve these issues by distributing a Virtual Machine with a functional build of the tool in addition to the source code.
|
||||
Regrettably, these Virtual Machines were not made available, preventing future researchers to take advantage of the work done by the auditors.
|
||||
Reaves #etal also report that real world applications are more challenging to analyze, with tools having lower results, taking more time and memory to run, sometimes to the point of not being able to run the analysis.
|
||||
We will confirm and expand this result in this chapter with a larger dataset than only 16 real-world applications.
|
||||
// Indeed, a more diverse dataset would assess the results and give more insight about the factors impacting the performances of the tools.
|
||||
This result is worrying considering it was noticed on a dataset of only 16 real-world application.
|
||||
A more diverse dataset would be needed to better assess the extend of the issue and give more insight about the factor impacting the perfomances of the tools.
|
||||
//We will confirm and expand this result in @sec:rasta with a larger dataset than only 16 real-world applications.
|
||||
|
||||
Finally, our approach is similar to the methodology employed by Mauthe #etal for decompilers~@mauthe_large-scale_2021.
|
||||
To assess the robustness of android decompilers, Mauthe #etal used 4 decompilers on a dataset of 40 000 applications.
|
||||
Mauthe #etal present an interresting methodology to asses the robustness of Android decompilers~@mauthe_large-scale_2021.
|
||||
They used 4 decompilers on a dataset of 40 000 applications.
|
||||
The error messages of the decompilers were parsed to list the methods that failed to decompile, and this information was used to estimate the main causes of failure.
|
||||
It was found that the failure rate is correlated to the size of the method, and that a consequent amount of failure are from third parties library rather than the core code of the application.
|
||||
They also concluded that malware are easier to entirely decompile, but have a higher failure rate, meaning that the one that are hard to decompile are substantially harder to decompile than goodware.
|
||||
|
||||
They also concluded that malware are easier to entirely decompile, but have a higher failure rate, meaning that the ones that are hard to decompile are substantially harder to decompile than goodware.
|
||||
|
||||
/*
|
||||
luoTaintBenchAutomaticRealworld2022 (TaintBench):
|
||||
|
@ -107,3 +121,10 @@ ReproDroid@pauckAndroidTaintAnalysis2018
|
|||
*droid@reaves_droid_2016
|
||||
DroidBench@Arzt2014a
|
||||
*/
|
||||
|
||||
#v(2em)
|
||||
|
||||
Reaves #etal raised two major concern for the use of Android static analysis tools.
|
||||
First, they can be quite difficult to setup, and second, they appear to have difficulties analysing read-world applications.
|
||||
This is problematic for a reverser engineer, not only do they need to invest a significan amont of work to setup a tool properly, they do not have any guarantees that the tool will actually manage to analyse the application they are investigating.
|
||||
#todo[Ref to pb1 and rasta.]
|
34
2_background/X_dynamic_analysis.typ
Normal file
34
2_background/X_dynamic_analysis.typ
Normal file
|
@ -0,0 +1,34 @@
|
|||
#import "../lib.typ": todo, APK, etal, ART, SDK, eg, jm-note, jfl-note
|
||||
#import "@preview/diagraph:0.3.3": raw-render
|
||||
|
||||
=== Dynamic Analysis <sec:bg-dynamic>
|
||||
|
||||
#todo[include properly]
|
||||
|
||||
The alternative to static analysis is dynamic analysis.
|
||||
With dynamic analysis, the application is actually executed.
|
||||
The most simple strategies consist in just running the application and examining its behavior.
|
||||
For instance, Bernardi #etal~@bernardi_dynamic_2019 use the log generated by `strace` to list the system calls generated in responce to an event to determine if an application is malicious.
|
||||
|
||||
More advanced methods are more intrusive and require modifing either the #APK, the Android framework, runtime, or kernel.
|
||||
TaintDroid~@Enck2010 for example modify the Dalvik Virtual Machine (the predecessor of the #ART) to track the data flow of an application at runtime, while AndroBlare~@Andriatsimandefitra2012 @andriatsimandefitra_detection_2015 try to compute the taint flow by hooking system calls using a Linux Security Module.
|
||||
DexHunter~@zhang2015dexhunter and AppSpear~@yang_appspear_2015 also patch the Dalvik Virtual Machine/#ART, this time to collect bytecode loaded dynamically.
|
||||
Modifying the Android framwork, runtime or kernel is possible thanks to the Android project beeing opensource, however this is delicate operation.
|
||||
Thus, a common issue faced by tools that took this approach is that they are stuck with a specific version of Android.
|
||||
Some sandboxes limit this issue by using dynamic binary instrumentation, like DroidHook~@cui_droidhook_2023, based the Xposed framework, or CamoDroid~@faghihi_camodroid_2022, based on Frida.
|
||||
|
||||
Another known challenge when analysing an application dynamically is the code coverage: if some part of the application is not executed, it cannot be annalysed.
|
||||
Considering that Android applications are meant to interact with a user, this can become problematic for automatic analysis.
|
||||
The Monkey tool developed by Google is one of the most used solution~@sutter_dynamic_2024.
|
||||
It sends a random streams of events the phone without tracking the state of the application.
|
||||
More advance tools statically analyse the application to model in order to improve the exploration.
|
||||
Sapienz~@mao_sapienz_2016 and Stoat~@su_guided_2017 uses this technique to improve application testing.
|
||||
GroddDroid~@abraham_grodddroid_2015 has the same approach but detect statically suspicious sections of code to target, and will interact with the application to target those code section.
|
||||
|
||||
Unfortuntely, exploring the application entirely is not always possible, as some applications will try to detect is they are in a sandbox environnement (#eg if they are in an emmulator, or if Frida is present in memory) and will refuse to run some sections of code if this is the case.
|
||||
Ruggia #etal~@ruggia_unmasking_2024 make a list of evasion techniques.
|
||||
They propose a new sandbox, DroidDungeon, that contrary to other sandboxes like DroidScope@droidscope180237 or CopperDroid@Tam2015, strongly emphasizes on resiliance against evasion mechanism.
|
||||
|
||||
#todo[RealDroid sandbox bases on modified ART?]
|
||||
#todo[force execution?]
|
||||
#todo[DyDroid, audit of Dynamic Code Loading~@qu_dydroid_2017]
|
|
@ -1,14 +1,15 @@
|
|||
#import "../lib.typ": todo, epigraph, jfl-note
|
||||
|
||||
= Background <sec:bg>
|
||||
= Background and Motivation <sec:bg>
|
||||
|
||||
#epigraph("Alexis \"Lex\" Murphy, Jurassic Park")[This is a Unix system. I know this.]
|
||||
|
||||
#include("0_intro.typ")
|
||||
#include("1_android.typ")
|
||||
#include("2_tools.typ")
|
||||
#include("3_analysis_techniques.typ")
|
||||
#include("4_datasets.typ")
|
||||
#include("3_static_analysis.typ")
|
||||
#include("4_datasets_and_benchmarking.typ")
|
||||
#include("X_dynamic_analysis.typ")
|
||||
|
||||
/*
|
||||
* Cours generique sur android
|
||||
|
|
|
@ -30,9 +30,8 @@ As a summary, the contributions of this paper are the following:
|
|||
*/
|
||||
|
||||
The chapter is structured as follows.
|
||||
@sec:rasta-soa presents a summary of previous works dedicated to Android static analysis tools.
|
||||
@sec:rasta-methodology presents the methodology employed to build our evaluation process and @sec:rasta-xp gives the associated experimental results.
|
||||
// @sec:rasta-discussion investigates the reasons behind the observed failures of some of the tools.
|
||||
@sec:rasta-discussion investigates the reasons behind the observed failures of some of the tools.
|
||||
@sec:rasta-discussion discusses the limitations of this work and gives some takeaways for future contributions.
|
||||
@sec:rasta-conclusion concludes the chapter.
|
||||
|
|
@ -3,6 +3,7 @@
|
|||
|
||||
= Evaluating the Reusability of Android Static Analysis Tools <sec:rasta>
|
||||
|
||||
/*
|
||||
#epigraph("Adira Tal and Sylvia Tilly, Star Trek: Discovery, \"People of Earth\"")[
|
||||
#block[
|
||||
#set align(left)
|
||||
|
@ -10,7 +11,11 @@
|
|||
"Okay, well, museums are cool, so..." \
|
||||
"That's what someone who lives in a museum would say." \
|
||||
]
|
||||
]
|
||||
]*/
|
||||
// Maxine "Max" Caulfield officially, but she explicitly say to never use Maxine, and prefered names are important
|
||||
#epigraph("Max Caulfield, Life is Strange \"Out of Time\"")[I keep going back in time.]
|
||||
// This one is fun, but wont happen XD:
|
||||
// #epigraph("T-Bug Cyberpunk 2077")[You Want Nice, Supportive? Call A Damn Helpline.]
|
||||
|
||||
#align(center, highlight(inset: 15pt, width: 75%, block(align(left)[
|
||||
This chapter intends to explore the robustness of past software dedicated to static analysis of Android applications.
|
||||
|
@ -20,8 +25,7 @@
|
|||
])))
|
||||
|
||||
|
||||
#include("0_intro.typ")
|
||||
#include("1_related_work.typ")
|
||||
#include("1_intro.typ")
|
||||
#include("2_methodology.typ")
|
||||
#include("3_experiments.typ")
|
||||
#include("4_discussion.typ")
|
||||
|
|
|
@ -40,7 +40,21 @@ When used directly by ART, the classes are usually stored in an application file
|
|||
#image(
|
||||
"figs/classloaders-crop.svg",
|
||||
width: 80%,
|
||||
alt: ""
|
||||
alt: "
|
||||
A box diagram. The diagram is split into two; the right section is labeled Runtime.
|
||||
On the left, there are 9 boxes. 3 are gray, labeled ClassLoader, SecureClassLoader, and URLClassLoader, and the other 6 are white: BootClassLoader, BaseDexClassLoader, DexClassLoader, InMemoryDexClassLoader, PathClassLoader, and DelegateLastClassLoader.
|
||||
Arrows go from SecureClassLoader, BaseDexClassLoader, and BootClassLoader to ClassLoader,
|
||||
from URLClassLoader to SecureClassLoader,
|
||||
from DexClassLoader, InMemoryDexClassLoader, and PathClassLoader to BaseDexClassLoader,
|
||||
and from DelegateLastClassLoader to PathClassLoader.
|
||||
|
||||
On the runtime side, there are 5 boxes: bootClassLoader, appClassLoader (multi dex), systemClassLoader,
|
||||
Specific delegator with two delegates, X.
|
||||
Arrows labeled delegate go from appClassLoader, systemClassLoader, and Specific delegator to bootClassLoader, and from Specific delegator to X.
|
||||
bootClassLoader, appClassLoader, and systemClassLoader are grouped in a dotted box labeled Android default behavior.
|
||||
Dotted lines labeled instance go across the central demarcation from appClassLoader to PathClassLoader, from systemClassLoader to PathClassLoader, and from Specific delegator to DelegateLastClassLoader.
|
||||
Another dotted line labeled instance singleton goes from bootClassLoader to BootClassLoader.
|
||||
"
|
||||
)
|
||||
gray -- Java-based, white -- Android-based
|
||||
],
|
||||
|
@ -115,7 +129,19 @@ We discuss in the next section how to obtain these classes from the emulator.
|
|||
image(
|
||||
"figs/architecture_SDK-crop.svg",
|
||||
width: 80%,
|
||||
alt: ""
|
||||
alt: "
|
||||
On the top right, a diagram of a web browser open at https//develoer.android.com, with the webpage reading: API documentation, SDK classes, and method descriptions.
|
||||
The web browser is labelled Documentation.
|
||||
On the bottom right, a box with the Android Studio logo (a blue pair of compasses in front of a green robot) is labeled 'Development Environment'.
|
||||
It contains two boxes: Developer classes and android.jar, and the text Dev SDK classes in bold.
|
||||
An arrow labeled API access goes from Developer classes to android.jar.
|
||||
On the left, a diagram of a smartphone with the Android logo (a green robot) contains two boxes: Platform classes and APK files.
|
||||
Platform classes contain the text 'boot.art: framework.jar + 24 .jar = Android SDK classes + Hidden classes'.
|
||||
APK file is split in two, in the top part: Developer classes + some extra classes, and on the bottom part: Multi DEX.
|
||||
An arrow labeled API access goes from APK file to Platform classes.
|
||||
Another arrow goes from Developer environment to APK file.
|
||||
|
||||
"
|
||||
),
|
||||
caption: [Location of SDK classes during development and at runtime]
|
||||
) <fig:cl-archisdk>
|
||||
|
|
|
@ -174,8 +174,6 @@ Regrettably, the documentation of `.is_android_api()` explains that the method i
|
|||
This means that although those methods are useful, the only indication of the use of an #Asdk or #hidec is the fact that the class is not in the APK file.
|
||||
Because of that, like for Apktool and Jadx, Androguard has no way to warn the reverser that the shadow of an #Asdk or #hidec is not the class used when running the application.
|
||||
|
||||
#todo[alt text androguard_call_graph]
|
||||
|
||||
#figure({
|
||||
set align(center)
|
||||
stack(dir: ltr,[
|
||||
|
@ -183,7 +181,15 @@ Because of that, like for Apktool and Jadx, Androguard has no way to warn the re
|
|||
image(
|
||||
"figs/call_graph_expected.svg",
|
||||
width: 45%,
|
||||
alt: ""
|
||||
alt: "
|
||||
A box diagram.
|
||||
Arrows goes from MainActivity.onCreate() to Activity.OnCreate() and Main.main(),
|
||||
from Main.main() to Obfuscation.doSomething() to Main.bad(),
|
||||
from another Obfuscation.doSomething() box to Main.good(),
|
||||
from Main.bad() to Log.i() and from Main.bad() to Log.i().
|
||||
There are two Obfuscation.doSomething(), the one pointed by Main.main() and that points to Main.bad() is white like the other boxes, the one without arrows pointed at and that points to Main.good() is gray.
|
||||
|
||||
"
|
||||
),
|
||||
supplement: [Subfigure],
|
||||
caption: [Expected Call Graph]
|
||||
|
@ -192,7 +198,14 @@ Because of that, like for Apktool and Jadx, Androguard has no way to warn the re
|
|||
image(
|
||||
"figs/call_graph_obf.svg",
|
||||
width: 45%,
|
||||
alt: ""
|
||||
alt: "
|
||||
A box diagram.
|
||||
Arrows goes from MainActivity.onCreate() to Activity.OnCreate() and Main.main(),
|
||||
from Main.main() to Obfuscation.doSomething() to Main.good(),
|
||||
from another Obfuscation.doSomething() box to Main.bad(),
|
||||
from Main.bad() to Log.i() and from Main.bad() to Log.i().
|
||||
There are two boxes Obfuscation.doSomething(), the one pointed by Main.main() and that points to Main.good() is gray, the one without arrows pointed at and that points to bad is white like the other boxes.
|
||||
"
|
||||
),
|
||||
supplement: [Subfigure],
|
||||
caption: [Call Graph Computed by Androguard]
|
||||
|
|
|
@ -106,7 +106,16 @@ We investigate later in @sec:cl-malware the case of malicious applications.
|
|||
image(
|
||||
"figs/redef_sdk_relative_min_sdk.svg",
|
||||
width: 100%,
|
||||
alt: ""
|
||||
alt: "
|
||||
A bar graph.
|
||||
The y-axis represents the number of classes, from 0 to over 40,000.
|
||||
The x-axis represents the version number of the first SDK containing the class up to version 34.
|
||||
The bars can have two possible colors: in red, the classes introduced before the Min SDK of their APK, and in green, the classes introduced after the Min SDK of their APK.
|
||||
In practice, for one value of the x-axis, almost all bars have only one color: the bars before SDK 17 are red, and the ones after are green (except for SDK 24, which has a very small portion of red).
|
||||
There are only 3 visible red bars, one for SDK version below or equal to 7 at around 30,000 classes, and two smaller ones around 5,000 classes at SDK 8 and 16.
|
||||
There are more green bars. SDK 21 and 30 are around 20,000 classes, 23 is at 30,000, 31 at 35,000, 26, 28, 29 are at 40,000, and 24 is well over 40,000.
|
||||
The remaining bars are between 0 and 5,000.
|
||||
"
|
||||
),
|
||||
caption: [Redefined SDK classes, sorted by the first SDK they appeared in.]
|
||||
)<fig:cl-classes_by_first_sdk>
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue