This commit is contained in:
Jean-Marie 'Histausse' Mineau 2025-07-29 16:23:42 +02:00
parent 243b9df134
commit c060e88996
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
17 changed files with 264 additions and 96 deletions

View file

@ -12,27 +12,27 @@
We review in this section the past existing contributions related to static analysis tools reusability.
Several papers have reviewed Android analysis tools produced by researchers.
Li #etal@Li2017 published a systematic literature review for Android static analysis before May 2015.
Li #etal~@Li2017 published a systematic literature review for Android static analysis before May 2015.
They analyzed 92 publications and classified them by goal, method used to solve the problem and underlying technical solution for handling the bytecode when performing the static analysis.
In particular, they listed 27 approaches with an open-source implementation available.
Nevertheless, experiments to evaluate the reusability of the pointed out software were not performed.
We believe that the effort of reviewing the literature for making a comprehensive overview of available approaches should be pushed further: an existing published approach with a software that cannot be used for technical reasons endanger both the reproducibility and reusability of research.
As we saw in @sec:bg-datasets that the need for a ground truth to test analysis tools leads test datasets to often be handcrafted.
The few datasets composed of real-world application confirmed that some tools such as Amandroid@weiAmandroidPreciseGeneral2014 and Flowdroid@Arzt2014a are less efficient on real-world applications@bosuCollusiveDataLeak2017 @luoTaintBenchAutomaticRealworld2022.
The few datasets composed of real-world application confirmed that some tools such as Amandroid~@weiAmandroidPreciseGeneral2014 and Flowdroid~@Arzt2014a are less efficient on real-world applications~@bosuCollusiveDataLeak2017 @luoTaintBenchAutomaticRealworld2022.
Unfortunatly, those real-world applications datasets are rather small, and a larger number of applications would be more suitable for our goal, #ie evaluating the reusability of a variety of static analysis tools.
Pauck #etal@pauckAndroidTaintAnalysis2018 used DroidBench@@Arzt2014a, ICC-Bench@weiAmandroidPreciseGeneral2014 and DIALDroid-Bench@@bosuCollusiveDataLeak2017 to compare Amandroid@weiAmandroidPreciseGeneral2014, DIAL-Droid@bosuCollusiveDataLeak2017, DidFail@klieberAndroidTaintFlow2014, DroidSafe@DBLPconfndssGordonKPGNR15, FlowDroid@Arzt2014a and IccTA@liIccTADetectingInterComponent2015 -- all these tools will be also compared in this chapter.
Pauck #etal~@pauckAndroidTaintAnalysis2018 used DroidBench~@Arzt2014a, ICC-Bench~@weiAmandroidPreciseGeneral2014 and DIALDroid-Bench~@bosuCollusiveDataLeak2017 to compare Amandroid~@weiAmandroidPreciseGeneral2014, DIAL-Droid~@bosuCollusiveDataLeak2017, DidFail~@klieberAndroidTaintFlow2014, DroidSafe~@DBLPconfndssGordonKPGNR15, FlowDroid~@Arzt2014a and IccTA~@liIccTADetectingInterComponent2015 -- all these tools will be also compared in this chapter.
To perform their comparison, they introduced the AQL (Android App Analysis Query Language) format.
AQL can be used as a common language to describe the computed taint flow as well as the expected result for the datasets.
It is interesting to notice that all the tested tools timed out at least once on real-world applications, and that Amandroid@weiAmandroidPreciseGeneral2014, DidFail@klieberAndroidTaintFlow2014, DroidSafe@DBLPconfndssGordonKPGNR15, IccTA@liIccTADetectingInterComponent2015 and ApkCombiner@liApkCombinerCombiningMultiple2015 (a tool used to combine applications) all failed to run on applications built for Android API 26.
It is interesting to notice that all the tested tools timed out at least once on real-world applications, and that Amandroid~@weiAmandroidPreciseGeneral2014, DidFail~@klieberAndroidTaintFlow2014, DroidSafe~@DBLPconfndssGordonKPGNR15, IccTA~@liIccTADetectingInterComponent2015 and ApkCombiner~@liApkCombinerCombiningMultiple2015 (a tool used to combine applications) all failed to run on applications built for Android API 26.
These results suggest that a more thorough study of the link between application characteristics (#eg date, size) should be conducted.
Luo #etal@luoTaintBenchAutomaticRealworld2022 used the framework introduced by Pauck #etal to compare Amandroid@weiAmandroidPreciseGeneral2014 and Flowdroid@Arzt2014a on DroidBench and their own dataset TaintBench, composed of real-world android malware.
Luo #etal~@luoTaintBenchAutomaticRealworld2022 used the framework introduced by Pauck #etal to compare Amandroid~@weiAmandroidPreciseGeneral2014 and Flowdroid~@Arzt2014a on DroidBench and their own dataset TaintBench, composed of real-world android malware.
They found out that those tools have a low recall on real-world malware, and are thus over adapted to micro-datasets.
Unfortunately, because AQL is only focused on taint flows, we cannot use it to evaluate tools performing more generic analysis.
A first work about quantifying the reusability of static analysis tools was proposed by Reaves #etal@reaves_droid_2016.
Seven Android analysis tools (Amandroid@weiAmandroidPreciseGeneral2014, AppAudit@xiaEffectiveRealTimeAndroid2015, DroidSafe@DBLPconfndssGordonKPGNR15, Epicc@octeau2013effective, FlowDroid@Arzt2014a, MalloDroid@fahlWhyEveMallory2012 and TaintDroid@Enck2010) were selected to check if they were still readily usable.
A first work about quantifying the reusability of static analysis tools was proposed by Reaves #etal~@reaves_droid_2016.
Seven Android analysis tools (Amandroid~@weiAmandroidPreciseGeneral2014, AppAudit~@xiaEffectiveRealTimeAndroid2015, DroidSafe~@DBLPconfndssGordonKPGNR15, Epicc~@octeau2013effective, FlowDroid~@Arzt2014a, MalloDroid~@fahlWhyEveMallory2012 and TaintDroid~@Enck2010) were selected to check if they were still readily usable.
For each tool, both the usability and results of the tool were evaluated by asking auditors to install and use it on DroidBench and 16 real world applications.
The auditors reported that most of the tools require a significant amount of time to setup, often due to dependencies issues and operating system incompatibilities.
Reaves #etal propose to solve these issues by distributing a Virtual Machine with a functional build of the tool in addition to the source code.
@ -41,7 +41,7 @@ Reaves #etal also report that real world applications are more challenging to an
We will confirm and expand this result in this chapter with a larger dataset than only 16 real-world applications.
// Indeed, a more diverse dataset would assess the results and give more insight about the factors impacting the performances of the tools.
Finally, our approach is similar to the methodology employed by Mauthe #etal for decompilers@mauthe_large-scale_2021.
Finally, our approach is similar to the methodology employed by Mauthe #etal for decompilers~@mauthe_large-scale_2021.
To assess the robustness of android decompilers, Mauthe #etal used 4 decompilers on a dataset of 40 000 applications.
The error messages of the decompilers were parsed to list the methods that failed to decompile, and this information was used to estimate the main causes of failure.
It was found that the failure rate is correlated to the size of the method, and that a consequent amount of failure are from third parties library rather than the core code of the application.