Some checks failed
/ test_checkout (push) Failing after 20s
119 lines
10 KiB
Typst
119 lines
10 KiB
Typst
#import "../lib.typ": etal, eg, ie
|
|
#import "X_var.typ": *
|
|
|
|
== Related Work <sec:rasta-soa>
|
|
|
|
// Research contributions often rely on existing datasets or provide new ones in order to evaluate the developed software.
|
|
// Raw datasets such as Drebin@Arp2014 contain few information about the provided applications.
|
|
// As a consequence, dataset suites have been developed to provide, in addition to the applications, meta information about the expected results.
|
|
// For example, taint analysis datasets should provide the source and expected sink of a taint.
|
|
// In some cases, the datasets are provided with additional software for automatizing part of the analysis.
|
|
// Thus,
|
|
We review in this section the past existing datasets provided by the community and the papers related to static analysis tools reusability.
|
|
|
|
=== Application Datasets
|
|
|
|
Computing if an application contains a possible information flow is an example of a static analysis goal.
|
|
Some datasets have been built especially for evaluating tools that are computing information flows inside Android applications.
|
|
One of the first well known dataset is DroidBench, that was released with the tool Flowdroid@Arzt2014a.
|
|
Later, the dataset ICC-Bench was introduced with the tool Amandroid@weiAmandroidPreciseGeneral2014 to complement DroidBench by introducing applications using Inter-Component data flows.
|
|
These datasets contain carefully crafted applications containing flows that the tools should be able to detect.
|
|
These hand-crafted applications can also be used for testing purposes or to detect any regression when the software code evolves.
|
|
Contrary to real world applications, the behavior of these hand-crafted applications is known in advance, thus providing the ground truth that the tools try to compute.
|
|
However, these datasets are not representative of real-world applications@Pendlebury2018 and the obtained results can be misleading.
|
|
//, especially for performance or reliability evaluation.
|
|
|
|
Contrary to DroidBench and ICC-Bench, some approaches use real-world applications.
|
|
Bosu #etal@bosuCollusiveDataLeak2017 use DIALDroid to perform a threat analysis of Inter-Application communication and published DIALDroid-Bench, an associated dataset.
|
|
Similarly, Luo #etal released TaintBench@luoTaintBenchAutomaticRealworld2022 a real-world dataset and the associated recommendations to build such a dataset.
|
|
These datasets confirmed that some tools such as Amandroid@weiAmandroidPreciseGeneral2014 and Flowdroid@Arzt2014a are less efficient on real-world applications.
|
|
These datasets are useful for carefully spotting missing taint flows, but contain only a few dozen of applications.
|
|
// A larger number of applications would be more suitable for our goal, #ie evaluating the reusability of a variety of static analysis tools.
|
|
|
|
|
|
Pauck #etal@pauckAndroidTaintAnalysis2018 used those three datasets to compare Amandroid@weiAmandroidPreciseGeneral2014, DIAL-Droid@bosuCollusiveDataLeak2017, DidFail@klieberAndroidTaintFlow2014, DroidSafe@DBLPconfndssGordonKPGNR15, FlowDroid@Arzt2014a and IccTA@liIccTADetectingInterComponent2015 -- all these tools will be also compared in this paper.
|
|
To perform their comparison, they introduced the AQL (Android App Analysis Query Language) format.
|
|
AQL can be used as a common language to describe the computed taint flow as well as the expected result for the datasets.
|
|
It is interesting to notice that all the tested tools timed out at least once on real-world applications, and that Amandroid@weiAmandroidPreciseGeneral2014, DidFail@klieberAndroidTaintFlow2014, DroidSafe@DBLPconfndssGordonKPGNR15, IccTA@liIccTADetectingInterComponent2015 and ApkCombiner@liApkCombinerCombiningMultiple2015 (a tool used to combine applications) all failed to run on applications built for Android API 26.
|
|
These results suggest that a more thorough study of the link between application characteristics (#eg date, size) should be conducted.
|
|
Luo #etal@luoTaintBenchAutomaticRealworld2022 used the framework introduced by Pauck #etal to compare Amandroid@weiAmandroidPreciseGeneral2014 and Flowdroid@Arzt2014a on DroidBench and their own dataset TaintBench, composed of real-world android malware.
|
|
They found out that those tools have a low recall on real-world malware, and are thus over adapted to micro-datasets.
|
|
Unfortunately, because AQL is only focused on taint flows, we cannot use it to evaluate tools performing more generic analysis.
|
|
|
|
=== Static Analysis Tools Reusability
|
|
|
|
Several papers have reviewed Android analysis tools produced by researchers.
|
|
Li #etal@Li2017 published a systematic literature review for Android static analysis before May 2015.
|
|
They analyzed 92 publications and classified them by goal, method used to solve the problem and underlying technical solution for handling the bytecode when performing the static analysis.
|
|
In particular, they listed 27 approaches with an open-source implementation available.
|
|
Nevertheless, experiments to evaluate the reusability of the pointed out software were not performed.
|
|
We believe that the effort of reviewing the literature for making a comprehensive overview of available approaches should be pushed further: an existing published approach with a software that cannot be used for technical reasons endanger both the reproducibility and reusability of research.
|
|
|
|
A first work about quantifying the reusability of static analysis tools was proposed by Reaves #etal@reaves_droid_2016.
|
|
Seven Android analysis tools (Amandroid@weiAmandroidPreciseGeneral2014, AppAudit@xiaEffectiveRealTimeAndroid2015, DroidSafe@DBLPconfndssGordonKPGNR15, Epicc@octeau2013effective, FlowDroid@Arzt2014a, MalloDroid@fahlWhyEveMallory2012 and TaintDroid@Enck2010) were selected to check if they were still readily usable.
|
|
For each tool, both the usability and results of the tool were evaluated by asking auditors to install and use it on DroidBench and 16 real world applications.
|
|
The auditors reported that most of the tools require a significant amount of time to setup, often due to dependencies issues and operating system incompatibilities.
|
|
Reaves #etal propose to solve these issues by distributing a Virtual Machine with a functional build of the tool in addition to the source code.
|
|
Regrettably, these Virtual Machines were not made available, preventing future researchers to take advantage of the work done by the auditors.
|
|
Reaves #etal also report that real world applications are more challenging to analyze, with tools having lower results, taking more time and memory to run, sometimes to the point of not being able to run the analysis.
|
|
We will confirm and expand this result in this paper with a larger dataset than only 16 real-world applications.
|
|
// Indeed, a more diverse dataset would assess the results and give more insight about the factors impacting the performances of the tools.
|
|
|
|
// PAS LA PLACE !
|
|
// Finally, our approach is similar to the methodology employed by Mauthe #etal for decompilers@mauthe_large-scale_2021.
|
|
// To assess the robustness of android decompilers, Mauthe #etal used 4 decompilers on a dataset of 40 000 applications.
|
|
// The error messages of the decompilers were parsed to list the methods that failed to decompile, and this information was used to estimate the main causes of failure.
|
|
// It was found that the failure rate is correlated to the size of the method, and that a consequent amount of failure are from third parties library rather than the core code of the application.
|
|
// They also concluded that malware are easier to entirely decompile, but have a higher failure rate, meaning that the one that are hard to decompile are substantially harder to decompile than goodware.
|
|
|
|
|
|
/*
|
|
luoTaintBenchAutomaticRealworld2022 (TaintBench):
|
|
- micro dataset app 'bad' (over adapted, perf drop with real world app) but
|
|
no found truth for real world apk: provide real world apk with ground truth
|
|
- provide a dataset framework for taint analysis on top of reprodroid
|
|
- /!\ compare current and previously evaluated version of AmAndroid and Flowdroid:
|
|
-> Up to date version of both tools are less accurate than predecessor <-
|
|
- timeout 20min: AmAndroid 11 apps, unsuccessfull exits 9
|
|
|
|
pauckAndroidTaintAnalysis2018 (ReproDroid):
|
|
- Introduce AQL (Android app analysis query language): standard langage to describe input
|
|
and output of a taint analysis tool, it allows to compare two taint analysis tools
|
|
- Introduce BREW (dataset refinement and execution wizard), a dataset framework
|
|
- Reproducible comparison of AmAndroid, DIAL-Droid, DidFail, DroidSafe, FlowDroid and IccTA
|
|
on Droid-Bench, ICC-Bench and DIALDroid-Bench(30 large real world app) + 18 custom apps
|
|
- real workd app test: 30 min timeout, all tools timedout/failled(?) at least once
|
|
- support for newer Android version: AmAndroid, DidFail, DroidSafe, IccTA, ApkCombiner fails
|
|
to run on apk build for API 26
|
|
|
|
reaves_droid_2016 (*Droid):
|
|
- assessment of apk analysis tools and challenges
|
|
- Test 7 tools to see if usable by dev and auditors (conclusion: challenging to use, difficult
|
|
to interpret output)
|
|
- AmAndroid: only run on small apk
|
|
- AppAudit: failled on 11/16 real world app (due to native code in 4 of those cases)
|
|
- DroidSafe: Fails every times due to memory leak
|
|
- Epicc: no pb, everage time < 20min for real world apks
|
|
- FlowDroid: Failled to analyse real world apks with default settings, and even with
|
|
64GB of ram could only analyse 1/6 apk of a real world category (mobile money app)
|
|
- MalloDroid: no pb
|
|
- TaintDroid: 7 crashes for 16 real worlds apks, probably due to native code
|
|
- **Found that those tools are frustrating to use, partly because of dependency issues and
|
|
OS incompatibility.** Ask for a full working VM as artifact.
|
|
|
|
Arzt2014a (DroidBench, same paper as flowdroid)
|
|
- hand crafted Android apps with test cases for interesting static-analysis problems like
|
|
field sensitivity, object sensitivity, access-path lengths, application life cycle,
|
|
async callback, ui interaction
|
|
|
|
|
|
A Large-Scale Empirical Study of Android App Decompilation
|
|
Noah Mauthe, Ulf Kargen, Nahid Shahmehri
|
|
|
|
|
|
|
|
TaintBench@luoTaintBenchAutomaticRealworld2022
|
|
ReproDroid@pauckAndroidTaintAnalysis2018
|
|
*droid@reaves_droid_2016
|
|
DroidBench@Arzt2014a
|
|
*/
|