commit ef50ff2f49f680845fa4a7c502978665fc683e7f Author: Jean-Marie Mineau Date: Sat Jun 21 12:52:35 2025 +0200 here we go diff --git a/0_preamble/acknowledgements.typ b/0_preamble/acknowledgements.typ new file mode 100644 index 0000000..8cdc627 --- /dev/null +++ b/0_preamble/acknowledgements.typ @@ -0,0 +1,7 @@ +#import "@local/template-thesis-matisse:0.0.1": todo + += Acknowledgements + +#todo[Acknowledge people] + +#lorem(400) diff --git a/0_preamble/french_summary.typ b/0_preamble/french_summary.typ new file mode 100644 index 0000000..d3965c8 --- /dev/null +++ b/0_preamble/french_summary.typ @@ -0,0 +1,9 @@ +#import "@local/template-thesis-matisse:0.0.1": todo + += Résumé en Français + +#todo[ + Write a "Substantial Summary" in french, at least 4 pages: https://ed-matisse.doctorat-bretagne.fr/fr/soutenance-de-these#p-151 +] + +#lorem(2000) diff --git a/0_preamble/notations.typ b/0_preamble/notations.typ new file mode 100644 index 0000000..5a95637 --- /dev/null +++ b/0_preamble/notations.typ @@ -0,0 +1,10 @@ +#let tldr = link()[TL;DR] + +#let notation_table = align(center, table( + columns: 2, + align: center+horizon, + table.header( + [Acronyms], [Meanings], + ), + tldr, [Too long; didn't read], +)) diff --git a/3_rasta/0_intro.typ b/3_rasta/0_intro.typ new file mode 100644 index 0000000..737a432 --- /dev/null +++ b/3_rasta/0_intro.typ @@ -0,0 +1,55 @@ +#import "@local/template-thesis-matisse:0.0.1": etal +#import "X_var.typ": * + +== Introduction + +Android is the most used mobile operating system since 2014, and since 2017, it even surpasses Windows all platforms combined#footnote[https://gs.statcounter.com/os-market-share#monthly-200901-202304]. +The public adoption of Android is confirmed by application developers, with 1.3 millions apps available in the Google Play Store in 2014, and 3.5 millions apps available in 2017#footnote[https://www.statista.com/statistics/266210]. +Its popularity makes Android a prime target for malware developers. // For example, various applications have been shown to steal personal information~\cite{shanSelfhidingBehaviorAndroid2018}. +Consequently, Android has also been an important subject for security research. +In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications. Two main approaches can be distinguished: static and dynamic analysis@Li2017. +Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system. +For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task. +// Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage. +// For malware, dynamic analysis is also limited by evading techniques that may prevent the execution of malicious parts of the code. // To explain better if we restore these sentences about malware + evading. +As a consequence, a lot of efforts have been put in static approaches, which is the focus of this paper. + +The usual goal of a static analysis is to compute data flows to detect potential information leaks@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15,@octeauCompositeConstantPropagation2015,@liIccTADetectingInterComponent2015 by analyzing the bytecode of an Android application. +The associated developed tools should support the Dalvik bytecode format, the multiplicity of entry points, the event driven architecture of Android applications, the interleaving of native code and bytecode, possibly loaded dynamically, the use of reflection, to name a few. +All these obstacles threaten the research efforts. +When using a more recent version of Android or a recent set of applications, the results previously obtained may become outdated and the developed tools may not work correctly anymore. + + +In this paper/*#footnote[This work was supported by the ANR Research under the Plan France 2030 bearing the reference ANR-22-PECY-0007.]*/, we study the reusability of open source static analysis tools that appeared between 2011 and 2017, on a recent Android dataset. +The scope of our study is *not* to quantify if the output results are accurate for ensuring reproducibility, because all the studied static analysis tools have different goals in the end. +On the contrary, we take as hypothesis that the provided tools compute the intended result but may crash or fail to compute a result due to the evolution of the internals of an Android application, raising unexpected bugs during an analysis. +This paper intends to show that sharing the software artifacts of a paper may not be sufficient to ensure that the provided software would be reusable. + +Thus, our contributions are the following. +We carefully retrieved static analysis tools for Android applications that were selected by Li #etal@Li2017 between 2011 and 2017. +We contacted the authors, whenever possible, for selecting the best candidate versions and to confirm the good usage of the tools. +We rebuild the tools in their original environment and we plan to share our Docker images with this paper. +We evaluated the reusability of the tools by measuring the number of successful analysis of applications taken /*in the Drebin dataset@Arp2014 and */ in a custom dataset that contains more recent applications (#NBTOTALSTRING in total). +The observation of the success or failure of these analysis enables us to answer the following research questions: + +/ RQ1: What Android static analysis tools that are more than 5 years old are still available and can be reused without crashing with a reasonable effort? +/ RQ2: How the reusability of tools evolved over time, especially when analyzing applications that are more than 5 years far from the publication of the tool? +/ RQ3: Does the reusability of tools change when analyzing goodware compared to malware? + +/* +As a summary, the contributions of this paper are the following: + +- We provide containers with a compiled version of all studied analysis tools, which ensures the reproducibility of our experiments and an easy way to analyze applications for other researchers. Additionally receipts for rebuilding such containers are provided. +- We provide a recent dataset of #NBTOTALSTRING applications balanced over the time interval 2010-2023. +- We point out which static analysis tools of Li #etal SLR paper@Li2017 can safely be used and we show that #resultunusable of evaluated tools are unusable (considering that a tool that fails more than 50% of time is unusable). In total, the success rate of the tools we could run is #resultratio on our dataset. +- We discuss the effect of applications features (date, size, SDK version, goodware/malware) on static analysis tools and the nature of the issues we found by studying statistics on the errors captured during our experiments. +*/ + +The paper is structured as follows. +Section@sec:rasta-soa presents a summary of previous works dedicated to Android static analysis tools. +Section@sec:rasta-methodology presents the methodology employed to build our evaluation process and Section@sec:rasta-xp gives the associated experimental results. +// Section@sec:rasta-discussion investigates the reasons behind the observed failures of some of the tools. +Section@sec:rasta-discussion discusses the limitations of this work and gives some takeaways for future contributions. +Section@sec:rasta-conclusion concludes the paper. + + diff --git a/3_rasta/1_related_work.typ b/3_rasta/1_related_work.typ new file mode 100644 index 0000000..3ee82f6 --- /dev/null +++ b/3_rasta/1_related_work.typ @@ -0,0 +1,119 @@ +#import "@local/template-thesis-matisse:0.0.1": etal, eg, ie +#import "X_var.typ": * + +== Related Work + +// Research contributions often rely on existing datasets or provide new ones in order to evaluate the developed software. +// Raw datasets such as Drebin@Arp2014 contain few information about the provided applications. +// As a consequence, dataset suites have been developed to provide, in addition to the applications, meta information about the expected results. +// For example, taint analysis datasets should provide the source and expected sink of a taint. +// In some cases, the datasets are provided with additional software for automatizing part of the analysis. +// Thus, +We review in this section the past existing datasets provided by the community and the papers related to static analysis tools reusability. + +=== Application Datasets + +Computing if an application contains a possible information flow is an example of a static analysis goal. +Some datasets have been built especially for evaluating tools that are computing information flows inside Android applications. +One of the first well known dataset is DroidBench, that was released with the tool Flowdroid@Arzt2014a. +Later, the dataset ICC-Bench was introduced with the tool Amandroid@weiAmandroidPreciseGeneral2014 to complement DroidBench by introducing applications using Inter-Component data flows. +These datasets contain carefully crafted applications containing flows that the tools should be able to detect. +These hand-crafted applications can also be used for testing purposes or to detect any regression when the software code evolves. +Contrary to real world applications, the behavior of these hand-crafted applications is known in advance, thus providing the ground truth that the tools try to compute. +However, these datasets are not representative of real-world applications@Pendlebury2018 and the obtained results can be misleading. +//, especially for performance or reliability evaluation. + +Contrary to DroidBench and ICC-Bench, some approaches use real-world applications. +Bosu #etal@bosuCollusiveDataLeak2017 use DIALDroid to perform a threat analysis of Inter-Application communication and published DIALDroid-Bench, an associated dataset. +Similarly, Luo #etal released TaintBench@luoTaintBenchAutomaticRealworld2022 a real-world dataset and the associated recommendations to build such a dataset. +These datasets confirmed that some tools such as Amandroid@weiAmandroidPreciseGeneral2014 and Flowdroid@Arzt2014a are less efficient on real-world applications. +These datasets are useful for carefully spotting missing taint flows, but contain only a few dozen of applications. +// A larger number of applications would be more suitable for our goal, #ie evaluating the reusability of a variety of static analysis tools. + + +Pauck #etal@pauckAndroidTaintAnalysis2018 used those three datasets to compare Amandroid@weiAmandroidPreciseGeneral2014, DIAL-Droid@bosuCollusiveDataLeak2017, DidFail@klieberAndroidTaintFlow2014, DroidSafe@DBLPconfndssGordonKPGNR15, FlowDroid@Arzt2014a and IccTA@liIccTADetectingInterComponent2015 -- all these tools will be also compared in this paper. +To perform their comparison, they introduced the AQL (Android App Analysis Query Language) format. +AQL can be used as a common language to describe the computed taint flow as well as the expected result for the datasets. +It is interesting to notice that all the tested tools timed out at least once on real-world applications, and that Amandroid@weiAmandroidPreciseGeneral2014, DidFail@klieberAndroidTaintFlow2014, DroidSafe@DBLPconfndssGordonKPGNR15, IccTA@liIccTADetectingInterComponent2015 and ApkCombiner@liApkCombinerCombiningMultiple2015 (a tool used to combine applications) all failed to run on applications built for Android API 26. +These results suggest that a more thorough study of the link between application characteristics (#eg date, size) should be conducted. +Luo #etal@luoTaintBenchAutomaticRealworld2022 used the framework introduced by Pauck #etal to compare Amandroid@weiAmandroidPreciseGeneral2014 and Flowdroid@Arzt2014a on DroidBench and their own dataset TaintBench, composed of real-world android malware. +They found out that those tools have a low recall on real-world malware, and are thus over adapted to micro-datasets. +Unfortunately, because AQL is only focused on taint flows, we cannot use it to evaluate tools performing more generic analysis. + +=== Static Analysis Tools Reusability + +Several papers have reviewed Android analysis tools produced by researchers. +Li #etal@Li2017 published a systematic literature review for Android static analysis before May 2015. +They analyzed 92 publications and classified them by goal, method used to solve the problem and underlying technical solution for handling the bytecode when performing the static analysis. +In particular, they listed 27 approaches with an open-source implementation available. +Nevertheless, experiments to evaluate the reusability of the pointed out software were not performed. +We believe that the effort of reviewing the literature for making a comprehensive overview of available approaches should be pushed further: an existing published approach with a software that cannot be used for technical reasons endanger both the reproducibility and reusability of research. + +A first work about quantifying the reusability of static analysis tools was proposed by Reaves #etal@reaves_droid_2016. +Seven Android analysis tools (Amandroid@weiAmandroidPreciseGeneral2014, AppAudit@xiaEffectiveRealTimeAndroid2015, DroidSafe@DBLPconfndssGordonKPGNR15, Epicc@octeau2013effective, FlowDroid@Arzt2014a, MalloDroid@fahlWhyEveMallory2012 and TaintDroid@Enck2010) were selected to check if they were still readily usable. +For each tool, both the usability and results of the tool were evaluated by asking auditors to install and use it on DroidBench and 16 real world applications. +The auditors reported that most of the tools require a significant amount of time to setup, often due to dependencies issues and operating system incompatibilities. +Reaves #etal propose to solve these issues by distributing a Virtual Machine with a functional build of the tool in addition to the source code. +Regrettably, these Virtual Machines were not made available, preventing future researchers to take advantage of the work done by the auditors. +Reaves #etal also report that real world applications are more challenging to analyze, with tools having lower results, taking more time and memory to run, sometimes to the point of not being able to run the analysis. +We will confirm and expand this result in this paper with a larger dataset than only 16 real-world applications. +// Indeed, a more diverse dataset would assess the results and give more insight about the factors impacting the performances of the tools. + +// PAS LA PLACE ! +// Finally, our approach is similar to the methodology employed by Mauthe #etal for decompilers@mauthe_large-scale_2021. +// To assess the robustness of android decompilers, Mauthe #etal used 4 decompilers on a dataset of 40 000 applications. +// The error messages of the decompilers were parsed to list the methods that failed to decompile, and this information was used to estimate the main causes of failure. +// It was found that the failure rate is correlated to the size of the method, and that a consequent amount of failure are from third parties library rather than the core code of the application. +// They also concluded that malware are easier to entirely decompile, but have a higher failure rate, meaning that the one that are hard to decompile are substantially harder to decompile than goodware. + + +/* +luoTaintBenchAutomaticRealworld2022 (TaintBench): + - micro dataset app 'bad' (over adapted, perf drop with real world app) but + no found truth for real world apk: provide real world apk with ground truth + - provide a dataset framework for taint analysis on top of reprodroid + - /!\ compare current and previously evaluated version of AmAndroid and Flowdroid: + -> Up to date version of both tools are less accurate than predecessor <- + - timeout 20min: AmAndroid 11 apps, unsuccessfull exits 9 + +pauckAndroidTaintAnalysis2018 (ReproDroid): + - Introduce AQL (Android app analysis query language): standard langage to describe input + and output of a taint analysis tool, it allows to compare two taint analysis tools + - Introduce BREW (dataset refinement and execution wizard), a dataset framework + - Reproducible comparison of AmAndroid, DIAL-Droid, DidFail, DroidSafe, FlowDroid and IccTA + on Droid-Bench, ICC-Bench and DIALDroid-Bench(30 large real world app) + 18 custom apps + - real workd app test: 30 min timeout, all tools timedout/failled(?) at least once + - support for newer Android version: AmAndroid, DidFail, DroidSafe, IccTA, ApkCombiner fails + to run on apk build for API 26 + +reaves_droid_2016 (*Droid): + - assessment of apk analysis tools and challenges + - Test 7 tools to see if usable by dev and auditors (conclusion: challenging to use, difficult + to interpret output) + - AmAndroid: only run on small apk + - AppAudit: failled on 11/16 real world app (due to native code in 4 of those cases) + - DroidSafe: Fails every times due to memory leak + - Epicc: no pb, everage time < 20min for real world apks + - FlowDroid: Failled to analyse real world apks with default settings, and even with + 64GB of ram could only analyse 1/6 apk of a real world category (mobile money app) + - MalloDroid: no pb + - TaintDroid: 7 crashes for 16 real worlds apks, probably due to native code + - **Found that those tools are frustrating to use, partly because of dependency issues and + OS incompatibility.** Ask for a full working VM as artifact. + +Arzt2014a (DroidBench, same paper as flowdroid) + - hand crafted Android apps with test cases for interesting static-analysis problems like + field sensitivity, object sensitivity, access-path lengths, application life cycle, + async callback, ui interaction + + + A Large-Scale Empirical Study of Android App Decompilation + Noah Mauthe, Ulf Kargen, Nahid Shahmehri + + + +TaintBench@luoTaintBenchAutomaticRealworld2022 +ReproDroid@pauckAndroidTaintAnalysis2018 +*droid@reaves_droid_2016 +DroidBench@Arzt2014a +*/ diff --git a/3_rasta/2_methodology.typ b/3_rasta/2_methodology.typ new file mode 100644 index 0000000..7da02c2 --- /dev/null +++ b/3_rasta/2_methodology.typ @@ -0,0 +1,274 @@ +#import "@local/template-thesis-matisse:0.0.1": todo, etal, eg +#import "X_var.typ": * +#import "X_lib.typ": * + +== Methodology + +=== Collecting Tools + +#figure({ + show table: set text(size: 0.80em) + show "#etal": etal + let show_citekeys(keys) = [ + #keys.split(",").map( + citekey => cite(label(citekey))).join([] + ) (#keys.split(",").map( + citekey => cite(label(citekey), form: "year") + ).join([])) + ] + table( + columns: 7, + inset: (x: 0% + 5pt, y: 0% + 2pt), + stroke: none, + align: center+horizon, + table.hline(), + table.header( + table.cell(colspan: 7, inset: 3pt)[], + table.cell(rowspan:2)[*Tool*], + table.vline(end: 3), + table.vline(start: 4), + table.cell(colspan:3)[*Availability*], + table.vline(end: 3), + table.vline(start: 4), + [*Repo*], + table.vline(end: 3), + table.vline(start: 4), + table.cell(rowspan:2)[*Decision*], + table.vline(end: 3), + table.vline(start: 4), + table.cell(rowspan:2)[*Comments*], + + [Bin], + [Src], + [Doc], + [type], + ), + table.cell(colspan: 7, inset: 3pt)[], + table.hline(), + table.cell(colspan: 7, inset: 3pt)[], + ..rasta_tool_data + .map(entry => ( + [#entry.tool #show_citekeys(entry.citekey)], + str2sym(entry.binary), + str2sym(entry.source), + str2sym(entry.documentation), + link(entry.url, entry.repo), + str2sym(entry.decision), + entry.why, + )).flatten(), + table.cell(colspan: 7, inset: 3pt)[], + table.hline(), + table.cell(colspan: 7, inset: 3pt)[], + table.hline(), + ) + [ + *binaries, sources*: #nr: not relevant, #ok: available, #bad: partially available, #ko: not provided\ + *documentation*: #okk: excellent, MWE, #ok: few inconsistencies, #bad: bad quality, #ko: not available\ + *decision*: #ok: considered; #bad: considered but not built; #ko: out of scope of the study + ]}, + caption: [Considered tools@Li2017: availability and usage reliability], +) + +We collected the static analysis tools from@Li2017, plus one additional paper encountered during our review of the state-of-the-art (DidFail@klieberAndroidTaintFlow2014). +They are listed in @tab:rasta-tools, with the original release date and associated paper. +We intentionally limited the collected tools to the ones selected by Li #etal@Li2017 for several reasons. +First, not using recent tools enables to have a gap of at least 5 years between the publication and the more recent APK files, which enables to measure the reusability of previous contribution with a reasonable gap of time. +Second, collecting new tools would require to describe these tools in depth, similarly to what have been performed by Li #etal@Li2017, which is not the primary goal of this paper. +Additionally, selection criteria such as the publication venue or number of citations would be necessary to select a subset of tools, which would require an additional methodology. +These possible contributions are left for future work. + +Some tools use hybrid analysis (both static and dynamic): A3E@DBLPconfoopslaAzimN13, A5@vidasA5AutomatedAnalysis2014, Android-app-analysis@geneiatakisPermissionVerificationApproach2015, StaDynA@zhauniarovichStaDynAAddressingProblem2015. +They have been excluded from this paper. +We manually searched the tool repository when the website mentioned in the paper is no longer available (#eg when the repository have been migrated from Google code to GitHub) and for each tool we searched for: + +- an optional binary version of the tool that would be usable as a fall back (if the sources cannot be compiled for any reason); +- the source code of the tool; +- the documentation for building and using the tool with a MWE (Minimum Working Example). + +In @tab:rasta-tools we rated the quality of these artifacts with "#ok" when available but may have inconsistencies, a "#bad" when too much inconsistencies (inaccurate remarks about the sources, dead links or missing parts) have been found, a "#ko" when no documentation have been found, and a double "#ok#ok" for the documentation when it covers all our expectations (building process, usage, MWE). +Results show that documentation is often missing or very poor (#eg Lotrack), which makes the rebuild process very complex and the first analysis of a MWE. + + +We finally excluded Choi #etal@CHOI2014620 as their tool works on the sources of Android applications, and Poeplau #etal@DBLPconfndssPoeplauFBKV14 that focus on Android hardening. +As a summary, in the end we have #nbtoolsselected tools to compare. +Some specificities should be noted. +The IC3 tool will be duplicated in our experiments because two versions are available: the original version of the authors and a fork used by other tools like IccTa. +For Androguard, the default task consists of unpacking the bytecode, the resources, and the Manifest. +Cross-references are also built between methods and classes. +Because such a task is relatively simple to perform, we decided to duplicate this tool and ask to Androguard to decompile an APK and create a control flow graph of the code using its decompiler: DAD. +We refer to this variant of usage as androguard_dad. + For Thresher and Lotrack, because these tools cannot be built, we excluded them from experiments. + + Finally, starting with #nbtools tools of @tab:rasta-tools, with the two variations of IC3 and Androguard, we have in total #nbtoolsselectedvariations static analysis tools to evaluate in which two tools cannot be built and will be considered as always failing. + +=== Source Code Selection and Building Process + +#figure({ + show table: set text(size: 0.80em) + show "#etal": etal + let show_citekeys(keys) = [ + #keys.split(",").map( + citekey => cite(label(citekey))).join([] + ) + ] + table( + columns: 8, + inset: (x: 0% + 5pt, y: 0% + 2pt), + stroke: none, + align: center+horizon, + table.hline(), + table.header( + table.cell(colspan: 8, inset: 3pt)[], + table.cell(rowspan:2)[*Tool*], + table.vline(end: 3), + table.vline(start: 4), + table.cell(colspan:2)[*Origin*], + table.vline(end: 3), + table.vline(start: 4), + table.cell(colspan:2)[*Alive Forks*], + table.vline(end: 3), + table.vline(start: 4), + table.cell(rowspan:2)[*Last Commit \ Date*], + table.vline(end: 3), + table.vline(start: 4), + table.cell(rowspan:2)[*Authors \ Reached*], + table.vline(end: 3), + table.vline(start: 4), + [*Environment*], + + [Stars], + [Alive], + [Nb], + [Usable], + [Language -- OS], + ), + table.cell(colspan: 8, inset: 3pt)[], + table.hline(), + table.cell(colspan: 8, inset: 3pt)[], + ..rasta_tool_data + .filter(entry => entry.exclude != "EXCLUDE") + .map(entry => ( + [#entry.tool #show_citekeys(entry.citekey)], + entry.stars, + str2sym(entry.alive), + entry.nbaliveforks, + str2sym(entry.forkusable), + entry.selecteddate, + str2sym(entry.authorconfirmed), + [#entry.lang -- #entry.os] + )).flatten(), + table.cell(colspan: 8, inset: 3pt)[], + table.hline(), + table.cell(colspan: 8, inset: 3pt)[], + table.hline(), + ) + [#ok: yes, #ko: no, UX.04: Ubuntu X.04]}, + caption: [Selected tools, forks, selected commits and running environment], +) + +In a second step, we explored the best sources to be selected among the possible forks of a tool. +We reported some indicators about the explored forks and our decision about the selected one in @tab:rasta-sources. +For each source code repository called "Origin", we reported in @tab:rasta-sources the number of GitHub stars attributed by users and we mentioned if the project is still alive (#ok in column Alive when a commit exist in the last two years). +Then, we analyzed the fork tree of the project. +We searched recursively if any forked repository contains a more recent commit than the last one of the branch mentioned in the documentation of the original repository. +If such a commit is found (number of such commits are reported in column Alive Forks Nb), we manually looked at the reasons behind this commit and considered if we should prefer this more up-to-date repository instead of the original one (column "Alive Forks Usable"). +As reported in @tab:rasta-sources, we excluded all forks, except IC3 for which we selected the fork JordanSamhi/ic3, because they always contain experimental code with no guarantee of stability. +For example, a fork of Aparecium contains a port for Windows 7 which does not suggest an improvement of the stability of the tool. +For IC3, the fork seems promising: it has been updated to be usable on a recent operating system (Ubuntu 22.04 instead of Ubuntu 12.04 for the original version) and is used as a dependency by IccTa. +We decided to keep these two versions of the tool (IC3 and IC3_fork) to compare their results. + +Then, we self-allocated a maximum of four days for each tool to successfully read and follow the documentation, compile the tool and obtain the expected result when executing an analysis of a MWE. +We sent an email to the authors of each tool to confirm that we used the more suitable version of the code, that the command line we used to analyze an application is the most suitable one and, in some cases, requested some help to solve issues in the building process. +We reported in @tab:rasta-sources the authors that answered our request and confirmed our decisions. + +From this building phase, several observations can be made. +Using a recent operating system, it is almost impossible in a reasonable amount of time to rebuild a tool released years ago. +Too many dependencies, even for Java based programs, trigger compilation or execution problems. +Thus, if the documentation mentions a specific operating system, we use a Docker image of this OS. +// For example, Dare is a dependency of several tools (Didfail, IC3) and depends on 32 bits libraries such as lib32stdc++ and ia32-libs. +// Those libraries are only available on Ubuntu 12 or previous versions. +// +Most of the time, tools require additional external components to be fully functional. +It could be resources such as the android.jar file for each version of the SDK, a database, additional libraries or tools. +Depending of the quality of the documentation, setting up those components can take hours to days. +This is why we automatized in a Dockerfile the setup of the environment in which the tool is built and run#footnote[To guarantee reproducibility we published the results, datasets, Dockerfiles and containers: https://github.com/histausse/rasta, https://zenodo.org/records/10144014, https://zenodo.org/records/10980349 and on Docker Hub as `histausse/rasta-:icsr2024`] + +=== Runtime Conditions + +#figure( + image( + "figs/running.svg", + width: 80%, + alt: "A diagram representing the methodology. The word 'Tool' is linked to a box labeled 'Docker image' by an arrow labeled 'building'. The box 'Docker image' is linked to a box labeled 'Singularity image' by an arrow labeled 'conversion'. The box 'Singularity image' is linked to a box labeled 'Execution monitoring' by a dotted arrow labeled 'Manuel tests' and to an image of a server labeled 'Singularity cluster' by an arrow labeled deployment. An image of three android logo labeled 'apks' is also linked to the 'Singularity cluster' by an arrow labeled 'running the tool analysis'. The 'Singularity cluster' image is linked to the 'Execution monitoring' box by an arrow labeled 'log capture'. The 'Execution monitoring' box linked to the words 'Exit status' by an unlabeled arrow.", + ), + caption: [Methodology overview], +) + +As shown in @fig:rasta-overview, before benchmarking the tools, we built and installed them in a Docker containers for facilitating any reuse of other researchers. +We converted them into Singularity containers because we had access to such a cluster and because this technology is often used by the HPC community for ensuring the reproducibility of experiments. +//The Docker container allows a user to interact more freely with the bundled tools. +//Then, we converted this image to a Singularity image. +We performed manual tests using these Singularity images to check: + +- the location where the tool is writing on the disk. For the best performances, we expect the tools to write on a mount point backed by an SSD. Some tools may write data at unexpected locations which required small patches from us. +- the amount of memory allocated to the tool. We checked that the tool could run a MWE with a #ramlimit limit of RAM. +- the network connection opened by the tool, if any. We expect the tool not to perform any network operation such as the download of Android SDKs. Thus, we prepared the required files and cached them in the images during the building phase. In a few cases, we patched the tool to disable the download of resources. + +A campaign of tests consists in executing the #nbtoolsvariationsrun selected tools on all APKs of a dataset. +The constraints applied on the clusters are: + +- No network connection is authorized in order to limit any execution of malicious software. +- The allocated RAM for a task is \ramlimit. +- The allocated maximum time is 1 hour. +- The allocated object space / stack space is 64 GB / 16 GB if the tool is a Java based program. + +For the disk files, we use a mount point that is stored on a SSD disk, with no particular limit of size. +Note that, because the allocation of #ramlimit could be insufficient for some tool, we evaluated the results of the tools on 20% of our dataset (described later in Section@sec:rasta-dataset) with 128 GB of RAM and #ramlimit of RAM and checked that the results were similar. +With this confirmation, we continued our evaluations with #ramlimit of RAM only. + + +=== Dataset + +/* +DATASET + +first seen year: pas dans les BDD officielles d'Androzoo: min added dans AndroZoo et date de VT analysis +% +année: 2010 et 2023 + +7% de malware +% +0 detection dans VT: good +5+ => malware +0-5 detection: exclu +% + +Les tranches de taille sont des déciles de d'androzoo (- les 1% extreme) +pour chaque année, pour chaque tranche de taille, on selectionne randomly 500 applications (avec bonne proporotion de malware) = bucket. +% +Probleme: Ce n'est pas représentatif de la population: il n'y a propablement pas 7% de malware and chaque décile d'androzoo pour chaque année +Probleme 2: pour sampler, on utilise les deciles de taille d'apk, mais pour nos plot on utiliser les deciles de taille de dex file. +% +500*10*14=70000 +% +% +*/ + +// Two datasets are used in the experiments of this section. +// The first one is *Drebin*@Arp2014, from which we extracted the malware part (5479 samples that we could retrieved) for comparison purpose only. +// It is a well known and very old dataset that should not be used anymore because it contains temporal and spatial biases@Pendlebury2018. +// We intend to compare the rate of success on this old dataset with a more recent one. +// The second one, +We built a dataset named *Rasta* to cover all dates between 2010 to 2023. +This dataset is a random extract of Androzoo@allixAndroZooCollectingMillions2016, for which we balanced applications between years and size. +For each year and inter-decile range of size in Androzoo, 500 applications have been extracted with an arbitrary proportion of 7% of malware. +This ratio has been chosen because it is the ratio of goodware/malware that we observed when performing a raw extract of Androzoo. +For checking the maliciousness of an Android application we rely on the VirusTotal detection indicators. +If more than 5 antiviruses have flagged the application as malicious, we consider it as a malware. +If no antivirus has reported the application as malicious, we consider it as a goodware. +Applications in between are dropped. + +For computing the release date of an application, we contacted the authors of Androzoo to compute the minimum date between the submission to Androzoo and the first upload to VirusTotal. +Such a computation is more reliable than using the DEX date that is often obfuscated when packaging the application. + +// \todo[Transition] // plus de place :-( diff --git a/3_rasta/3_experiments.typ b/3_rasta/3_experiments.typ new file mode 100644 index 0000000..5e3ee2f --- /dev/null +++ b/3_rasta/3_experiments.typ @@ -0,0 +1,390 @@ +#import "@local/template-thesis-matisse:0.0.1": todo, highlight +#import "X_var.typ": * +#import "X_lib.typ": * + +== Experiments + + +=== *RQ1*: Re-Usability Evaluation + + +#todo[alt text for figure rasta-exit / rasta-exit-drebin] +#figure( + image("figs/exit-status-for-the-drebin-dataset.svg", width: 80%), + caption: [Exit status for the Drebin dataset], +) + +#figure( + image("figs/exit-status-for-the-rasta-dataset.svg", width: 80%), + caption: [Exit status for the Rasta dataset], +) + + +Figures@fig:rasta-exit-drebin and@fig:rasta-exit compare the Drebin and Rasta datasets. +They represent the success/failure rate (green/orange) of the tools. +We distinguished failure to compute a result from timeout (blue) and crashes of our evaluation framework (in grey, probably due to out of memory kills of the container itself). +Because it may be caused by a bug in our own analysis stack, exit status represented in grey (Other) are considered as unknown errors and not as failure of the tool. +#todo[We discuss further errors for which we have information in the logs in Section/*@sec:rasta-failure-analysis*/.] + +Results on the Drebin datasets shows that 11 tools have a high success rate (greater than 85%). +The other tools have poor results. +The worst, excluding Lotrack and Tresher, is Anadroid with a ratio under 20% of success. + +On the Rasta dataset, we observe a global increase of the number of failed status: #resultunusablenb tools (#resultunusable) have a finishing rate below 50%. +The tools that have bad results with Drebin are of course bad result on Rasta. +Three tools (androguard_dad, blueseal, saaf) that were performing well (higher than 85%) on Drebin surprisingly fall below the bar of 50% of failure. +7 tools keep a high success rate: Adagio, Amandroid, Androguard, Apparecium, Gator, Mallodroid, Redexer. +Regarding IC3, the fork with a simpler build process and support for modern OS has a lower success rate than the original tool. + +Two tools should be discussed in particular. +//Androguard and Flowdroid have a large community of users, as shown by the numbers of GitHub stars in Table~\ref{tab:sources}. +Androguard has a high success rate which is not surprising: it used by a lot of tools, including for analyzing application uploaded to the Androzoo repository. +//Because of that, it should be noted that our dataset is biased in favour of Androguard. // Already in discution +Nevertheless, when using Androguard decompiler (DAD) to decompile an APK, it fails more than 50% of the time. +This example shows that even a tool that is frequently used can still run into critical failures. +Concerning Flowdroid, our results show a very low timeout rate (#mypercent(37, NBTOTAL)) which was unexpected: in our exchanges, Flowdroid's author were expecting a higher rate of timeout and fewer crashes. + +As a summary, the final ratio of successful analysis for the tools that we could run +// and applications of Rasta dataset +is #mypercent(54.9, 100). When including the two defective tools, this ratio drops to #mypercent(49.9, 100). + +#highlight()[ +*RQ1 answer:* +On a recent dataset we consider that \resultunusable of the tools + are unusable. For the tools that we could run, \resultratio of analysis are finishing successfully.%(those with less than 50\% of successful execution and including the two tools that we were unable to build). +] + +/* +== RQ2: temporal evolution + +#todo[alt text for fig rasta-exit-evolution-java and rasta-exit-evolution-not-java] + +#figure(stack(dir: ltr, + [#figure( + image( + "figs/finishing-rate-by-year-of-java-based-tools.svg", + width: 48%, + alt: "" + ), + caption: [Java based tools], + supplement: [Subfigure], + ) ], + [#figure( + image( + "figs/finishing-rate-by-year-of-non-java-based-tools.svg", + width: 48%, + alt: "", + ), + caption: [Non Java based tools], + supplement: [Subfigure], + ) ] + ), caption: [Exit status evolution for the Rasta dataset] +) + +For investigating the effect of application dates on the tools, we computed the date of each APK based on the minimum date between the first upload in AndroZoo and the first analysis in VirusTotal. +Such a computation is more reliable than using the dex date that is often obfuscated when packaging the application. +Then, for the sake of clarity of our results, we separated the tools that have mainly Java source code from those that use other languages. +Among the ones that are Java based programs, most of them use the Soot framework which may correlate the obtained results. @fig:rasta-exit-evolution-java (resp. @fig:rasta-exit-evolution-not-java) compares the success rate of the tools between 2010 and 2023 for Java based tools (resp. non Java based tools). +For Java based tools, a clear decrease of finishing rate can be observed globally for all tools. +For non-Java based tools, 2 of them keep a high success rate (Androguard, Mallodroid). +The result is expected for Androguard, because the analysis is relatively simple and the tool is largely adopted, as previously mentioned. +Mallodroid being a relatively simple script leveraging Androgard, it benefit from Androguard resilience. +It should be noted that Saaf keeps a high success ratio until 2014 and then quickly decreases to less than 20% after 2014. This example shows that, even with an identical source code and the same running platform, a tool can change of behavior among time because of the evolution of the structure of the input files. + +An interesting comparison is the specific case of Ic3 and Ic3_fork. Until 2019, the success rate is very similar. After 2020, ic3_fork is continuing to decrease whereas Ic3 keeps a success rate of around 60%. + +/* +``` +sqlite> SELECT apk1.first_seen_year, (COUNT(*) * 100) / (SELECT 20 * COUNT(*) +(x1...> FROM apk AS apk2 WHERE apk2.first_seen_year = apk1.first_seen_year +(x1...> ) +...> FROM exec JOIN apk AS apk1 ON exec.sha256 = apk1.sha256 +...> WHERE exec.tool_status = 'FINISHED' OR exec.tool_status = 'UNKNOWN' +...> GROUP BY apk1.first_seen_year ORDER BY apk1.first_seen_year; +2010|78 +2011|78 +2012|76 +2013|70 +2014|66 +2015|61 +2016|57 +2017|54 +2018|49 +2019|47 +2020|45 +2021|42 +2022|40 +2023|39 +``` +*/ + +#highlight()[ +*RQ2 answer:* For the #nbtoolsselected tools that can be used partially, a global decrease of the success rate of tools' analysis is observed over time. +Starting at 78% of success rate, after five years, tools have 61% of success; after ten years, 45% of success. +] +*/ + + +=== RQ2: Size, SDK and Date Influence + +To measure the influence of the date, SDK version and size of applications, we fixed one parameter while varying an other. +For the sake of clarity, we separated Java based / non Java based tools. + +#todo[Alt text for fig rasta-decorelation-size] +#figure(stack(dir: ltr, + [#figure( + image( + "figs/decorelation/finishing-rate-of-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg", + width: 48%, + alt: "" + ), + caption: [Java based tools], + supplement: [Subfigure], + ) ], + [#figure( + image( + "figs/decorelation/finishing-rate-of-non-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg", + width: 48%, + alt: "", + ), + caption: [Non Java based tools], + supplement: [Subfigure], + ) ] + ), caption: [Finishing rate by bytecode size for APK detected in 2022] +) + +_Fixed application year. (5000 APKs)_ +We selected the year 2022 which has a good amount of representatives for each decile of size in our application dataset. +@fig:rasta-rate-evolution-java-2022} (resp. @fig:rasta-rate-evolution-non-java-2022) shows the finishing rate of the tools in function of the size of the bytecode for Java based tools (resp. non Java based tools) analyzing applications of 2022. +We can observe that all Java based tools have a finishing rate decreasing over years. 50% of non Java based tools have the same behavior. + +#todo[Alt text for fig rasta-decorelation-size] +#figure(stack(dir: ltr, + [#figure( + image( + "figs/decorelation/finishing-rate-of-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg", + width: 48%, + alt: "" + ), + caption: [Java based tools], + supplement: [Subfigure], + ) ], + [#figure( + image( + "figs/decorelation/finishing-rate-of-non-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg", + width: 48%, + alt: "", + ), + caption: [Non Java based tools], + supplement: [Subfigure], + ) ] + ), caption: [Finishing rate by discovery year with a bytecode size $in$ [4.08, 5.2] MB] +) + +_Fixed application bytecode size. (6252 APKs)_ We selected the sixth decile (between 4.08 and 5.20 MB), which is well represented in a wide number of years. +@fig:rasta-rate-evolution-java-decile-year (resp. @fig:rasta-rate-evolution-non-java-decile-year) represents the finishing rate depending of the year at a fixed bytecode size. +We observe that 9 tools over 12 have a finishing rate dropping below 20% for Java based tools, which is not the case for non Java based tools. + +#todo[Alt text for fig rasta-decorelation-min-sdk] +#figure(stack(dir: ltr, + [#figure( + image( + "figs/decorelation/finishing-rate-of-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg", + width: 48%, + alt: "" + ), + caption: [Java based tools], + supplement: [Subfigure], + ) ], + [#figure( + image( + "figs/decorelation/finishing-rate-of-non-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg", + width: 48%, + alt: "", + ), + caption: [Non Java based tools], + supplement: [Subfigure], + ) ] + ), caption: [Finishing rate by min SDK with a bytecode size $in$ [4.08, 5.2] MB] +) + +We performed similar experiments by variating the min SDK and target SDK versions, still with a fixed bytecode size between 4.08 and 5.2 MB, as shown in @fig:rasta-rate-evolution-java-decile-min-sdk and @fig:rasta-rate-evolution-non-java-decile-min-sdk. +We found that contrary to the target SDK, the min SDK version has an impact on the finishing rate of Java based tools: 8 tools over 12 are below 50% after SDK 16. +It is not surprising, as the min SDK is highly correlated to the year. + +#highlight()[ +*RQ2 answer:* +The success rate varies based on the size of bytecode and SDK version. +The date is also correlated with the success rate for Java based tools only. +] + + +=== RQ3: Malware vs Goodware + +/* +``` +sqlite> SELECT vt_detection == 0, COUNT(exec.sha256) FROM exec INNER JOIN apk ON exec.sha256 = apk.sha256 WHERE tool_status = 'FINISHED' AND dex_size_decile = 6 GROUP BY vt_detection == 0; +0|2971 % malware +1|60455 % goodware +sqlite> SELECT vt_detection == 0, COUNT(DISTINCT sha256) FROM apk WHERE dex_size_decile = 6 GROUP BY vt_detection == 0; +0|243 +1|6009 +``` +``` +>>> 61.13168724279835 +0.4969812257050699 +>>> 60455/6009/20 * 100 +50.30371110001665 +``` + + rate goodware rate malware avg size goodware (MB) avg size malware (MB) + decile 1: 85.42 82.02 0.13 0.11 + decile 2: 74.46 72.34 0.54 0.55 + decile 3: 63.38 65.67 1.37 1.25 + decile 4: 57.21 62.31 2.41 2.34 + decile 5: 53.36 59.27 3.56 3.55 + decile 6: 50.3 61.13 4.61 4.56 + decile 7: 46.76 56.54 5.87 5.91 + decile 8: 42.57 56.23 7.64 7.63 + decile 9: 39.09 57.94 11.39 11.26 + decile 10: 33.34 45.86 24.24 21.36 + total: 54.28 64.82 6.29 4.14 +*/ + + +/* +#todo[Alt text for rasta-exit-goodmal] +#figure( + image( + "figs/exit-status-for-the-rasta-dataset-goodware-malware.svg", + width: 80%, + alt: "", + ), + caption: [Exit status comparing goodware and malware for the Rasta dataset], +) +*/ + + +/* +[15:25] Jean-Marie Mineau + +moyenne de la taille total des dex: 6464228.10027989 + +[15:26] Jean-Marie Mineau + +(tout confondu) + +[15:26] Jean-Marie Mineau + +goodware: 6598464.94224066 + +malware: 4337376.97252155 + +``` +sqlite> SELECT AVG(apk_size) FROM apk; +16918107.6526989 +sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection = 0; +16897989.4472311 +sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0; +17236860.8903556 +``` +*/ + + +/* +#figure({ + show table: set text(size: 0.80em) + table( + columns: 4, + inset: (x: 0% + 5pt, y: 0% + 2pt), + stroke: none, + align: center+horizon, + table.hline(), + table.header( + table.cell(colspan: 4, inset: 3pt)[], + table.cell(rowspan:2)[*Rasta part*], + table.vline(end: 3), + table.vline(start: 4), + table.cell(colspan:2)[*Average size*], + table.vline(end: 3), + table.vline(start: 4), + table.cell(rowspan:2)[*Average date*], + [*APK*], + [*DEX*], + ), + table.cell(colspan: 4, inset: 3pt)[], + table.hline(), + table.cell(colspan: 4, inset: 3pt)[], + + [*goodware*], num(16897989), num(6598464), [2017], + [*malware*], num(17236860), num(4337376), [2017], + [*total*], num(16918107), num(6464228), [2017], + + table.cell(colspan: 4, inset: 3pt)[], + table.hline(), + )}, + caption: [Average size and date of goodware/malware parts of the Rasta dataset], +) +*/ + + +#figure({ + show table: set text(size: 0.80em) + table( + columns: 7, + inset: (x: 0% + 5pt, y: 0% + 2pt), + stroke: none, + align: center+horizon, + table.hline(), + table.header( + table.cell(colspan: 7, inset: 3pt)[], + table.cell(rowspan: 2)[*Decile*], + table.vline(end: 3), + table.vline(start: 4), + table.cell(colspan:2)[*Average DEX size (MB)*], + table.vline(end: 3), + table.vline(start: 4), + table.cell(colspan:2)[* Finishing Rate: FR*], + table.vline(end: 3), + table.vline(start: 4), + [*Ratio Size*], + table.vline(end: 3), + table.vline(start: 4), + [*Ratio FR*], + [Good], [Mal], + [Good], [Mal], + [Good/Mal], [Good/Mal], + ), + table.cell(colspan: 7, inset: 3pt)[], + table.hline(), + table.cell(colspan: 7, inset: 3pt)[], + + num(1), num(0.13), num(0.11), num(0.85), num(0.82), num(1.17), num(1.04), + num(2), num(0.54), num(0.55), num(0.74), num(0.72), num(0.97), num(1.03), + num(3), num(1.37), num(1.25), num(0.63), num(0.66), num(1.09), num(0.97), + num(4), num(2.41), num(2.34), num(0.57), num(0.62), num(1.03), num(0.92), + num(5), num(3.56), num(3.55), num(0.53), num(0.59), num(1.00), num(0.90), + num(6), num(4.61), num(4.56), num(0.50), num(0.61), num(1.01), num(0.82), + num(7), num(5.87), num(5.91), num(0.47), num(0.57), num(0.99), num(0.83), + num(8), num(7.64), num(7.63), num(0.43), num(0.56), num(1.00), num(0.76), + num(9), num(11.39), num(11.26), num(0.39), num(0.58), num(1.01), num(0.67), + num(10), num(24.24), num(21.36), num(0.33), num(0.46), num(1.13), num(0.73), + + table.cell(colspan: 7, inset: 3pt)[], + table.hline(), + )}, + caption: [DEX size and Finishing Rate (FR) per decile], +) + +We compared the finishing rate of malware and goodware applications for evaluated tools. +Because, the size of applications impacts this finishing rate, it is interesting to compare the success rate for each decile of bytecode size. +@tab:rasta-sizes-decile reports the bytecode size and the finishing rate of goodware and malware in each decile of size. +We also computed the ratio of the bytecode size and finishing rate for the two populations. +We observe that the ratio for the finishing rate decreases from 1.04 to 0.73, while the ratio of the bytecode size is around 1. +We conclude from this table that analyzing malware triggers less errors than for goodware. + + +#highlight()[ +*RQ3 answer:* +Analyzing malware applications triggers less errors for static analysis tools than analyzing goodware for comparable bytecode size. +] diff --git a/3_rasta/4_discussion.typ b/3_rasta/4_discussion.typ new file mode 100644 index 0000000..e4dc058 --- /dev/null +++ b/3_rasta/4_discussion.typ @@ -0,0 +1,389 @@ +== Discussion + +\subsection{State-of-the-art comparison} + +Our finding are consistent with the numerical results of Pauck {\it et al.} that showed that \mypercent{106}{180} of DIALDroid-Bench~\cite{bosuCollusiveDataLeak2017} real-world applications are analyzed successfully with the 6 evaluated tools~\cite{pauckAndroidTaintAnalysis2018}. +Six years after the release of DIALDroid-Bench, we obtain a lower ratio of \mypercent{40.05}{100} for the same set of 6 tools but using the Rasta dataset of \NBTOTALSTRING applications. +We extended this result to a set of \nbtoolsvariationsrun\xspace tools and obtained a global success rate of \resultratio. We confirmed that most tools require a significant amount of work to get them running~\cite{reaves_droid_2016}. +%Our investigations of crashes also confirmed that dependencies to older versions of Apktool are impacting the performances of Anadroid, Saaf and Wognsen {\it et al.} in addition to DroidSafe and IccTa, already identified by Pauck {\it et al.}. +% + +Investigating the reason behind tools' errors is a difficult task and will be investigated in a future work. For now, our manual investigations show that the nature of errors varies from one analysis to another, without any easy solution for the end user for fixing it. + + +\subsection{Recommendations} + +Finally, we summarize some takeaways that developers should follow to improve the success of reusing their developed software. + +For improving the reliability of their software, developers should use classical development best practices, for example continuous integration, testing, code review. For improving the reusability developers should + write a documentation about the tool usage and provide a minimal working example and describe the expected results. Interactions with the running environment should be minimized, for example by using a docker container, a virtual environment or even a virtual machine. Additionally, a small dataset +should be provided for a more extensive test campaign and the publishing of the expected result on this dataset would ensure to be able to evaluate the reproducibility of experiments. + +Finally, an important remark concerns the libraries used by a tool. We have seen two types of libraries: + a)~internal libraries manipulating internal data of the tool; + b)~external libraries that are used to manipulate the input data (APKs, bytecode, resources). +We observed by our manual investigations that external libraries are the ones leading to crashes because of variations in recent APKs (file format, unknown bytecode instructions, multi-DEX files). We believe that the developer should provide enough documentation to make possible a later upgrade of these external libraries. +%: for example, old versions of apktool are the top most libraries raising errors. + +\subsection{Threats to validity} + + +Our application dataset is biased in favor of Androguard, because Androzoo have already used Androguard internally when collecting applications and discarded any application that cannot be processed with this tool. + +Despite our best efforts, it is possible that we made mistakes when building or using the tools. It is also possible that we wrongly classified a result as a failure. To mitigate this possible problem we contacted the authors of the tools to confirm that we used the right parameters and chose a valid failure criterion. %Before running the final experiment, we also ran the tools on a subset of our dataset and looked manually the most common errors to ensure that they are not trivial errors that can be solved. + +The timeout value, amount of memory are arbitrarily fixed. For mitigating their effect, a small extract of our dataset has been analyzed with more memory/time for measuring any difference. + +Finally, the use of VirusTotal for determining if an application is a malware or not may be wrong. For limiting this impact, we used a threshold of at most 5 antiviruses (resp. no more than 0) reporting an application as being a malware (resp. goodware) for taking a decision about maliciousness (resp. benignness). + +% +%\section{Discussion} +%\label{sec:discussion} +% +%\newcommand{\mrc}[1]{\rotcell{\makebox[0pt][l]{#1}}} +% +%\settowidth\rotheadsize{androguarda} +% +%%\newcommand{\mynum}[1]{% +%% \ifthenelse{\equal{\first}{}}{\num[round-mode=places,round-precision=1]{#1}}{\textbf{\num[round-mode=places,round-precision=1]{#1}}} +%%} +%% +%% +%%\newcommand{\mynums}[1]{% +%% \ifthenelse{\equal{\first}{}}{\num[round-mode=places,round-precision=0]{#1}}{\textbf{\num[round-mode=places,round-precision=0]{#1}}} +%%} +%\newcommand{\mynum}[1]{\num[round-mode=places,round-precision=1]{#1}} +% +% +% +%\newcommand{\mynums}[1]{\num[round-mode=places,round-precision=0]{#1}} +% +% +% +%\newcommand{\mynumm}[1]{\num[round-mode=places,round-precision=1]{#1}} +% +% +% \begin{table*}[tb] +% \scriptsize +% \caption{Average number of errors, analysis time, memory per unitary analysis -- compared by exit status } +% \label{tab:avgerror} +% +% \begin{tabular}{r|r|r|r|r|r|r|r|r|r|r|r|r|r|r|r|r|r|r|r|r|r} +% \toprule +% Exit status& & \mrc{adagio} & \mrc{amandroid} & \mrc{anadroid} & \mrc{androguard} & \mrc{androguard\_dad} & \mrc{apparecium}& \mrc{blueseal} &\mrc{dialdroid}& \mrc{didfail}& \mrc{droidsafe}& \mrc{flowdroid}& \mrc{gator}& \mrc{ic3}& \mrc{ic3\_fork}& \mrc{iccta}& \mrc{mallodroid}& \mrc{perfchecker}& \mrc{redexer}& \mrc{saaf}& \mrc{wognsen\_et\_al} \\ +% \midrule +% & \multicolumn{21}{c}{\bf Average number of errors (and standard deviation)} \\ +%\cline{2-22} +% \csvreader[ +% late after line = \\, +% %separator=semicolon, +% head to column names, +% ]{average_number_of_error_by_exec.csv}{}{% +% \first & \type & \mynum{\adagio} & \mynum{\amandroid} & \mynum{\anadroid} & \mynum{\androguard} & \mynum{\androguarddad} & \mynum{\apparecium} & \mynum{\blueseal} & \mynum{\dialdroid} & \mynum{\didfail} & \mynum{\droidsafe} & \mynum{\flowdroid} & \mynum{\gator} & \mynum{\ic} & \mynum{\icfork} & \mynum{\iccta} & \mynum{\mallodroid} & \mynum{\perfchecker} & \mynum{\redexer} & \mynum{\saaf} & \mynum{\wognsenetal} +% }% +%\midrule +% & \multicolumn{21}{c}{\bf Average time (s)} \\ +%\cline{2-22} +% \csvreader[ +%late after line = \\, +%%separator=semicolon, +%head to column names, +%]{average_time-final.csv}{}{% +% \first & \type & \mynums{\adagio} & \mynums{\amandroid} & \mynums{\anadroid} & \mynums{\androguard} & \mynums{\androguarddad} & \mynums{\apparecium} & \mynums{\blueseal} & \mynums{\dialdroid} & \mynums{\didfail} & \mynums{\droidsafe} & \mynums{\flowdroid} & \mynums{\gator} & \mynums{\ic} & \mynums{\icfork} & \mynums{\iccta} & \mynums{\mallodroid} & \mynums{\perfchecker} & \mynums{\redexer} & \mynums{\saaf} & \mynums{\wognsenetal} +%}% +%\midrule +%& \multicolumn{21}{c}{\bf Average Memory (GB)} \\ +%\cline{2-22} +%\csvreader[ +%late after line = \\, +%%separator=semicolon, +%head to column names, +%]{average_mem-final.csv}{}{% +% \first & \type & \mynumm{\adagio} & \mynumm{\amandroid} & \mynumm{\anadroid} & \mynumm{\androguard} & \mynumm{\androguarddad} & \mynumm{\apparecium} & \mynumm{\blueseal} & \mynumm{\dialdroid} & \mynumm{\didfail} & \mynumm{\droidsafe} & \mynumm{\flowdroid} & \mynumm{\gator} & \mynumm{\ic} & \mynumm{\icfork} & \mynumm{\iccta} & \mynumm{\mallodroid} & \mynumm{\perfchecker} & \mynumm{\redexer} & \mynumm{\saaf} & \mynumm{\wognsenetal} +%}% +% \bottomrule +% \end{tabular} +%\end{table*} +% +% +%In this section, we investigate the reasons behind the high ratio of failures presented in Section~\ref{sec:xp}. Table~\ref{tab:avgerror} reports the average number of errors, the average time and memory consumption of the analysis of one APK file. We also compare our conclusions to the ones of the literature. +% +% +% +%\subsection{Failures Analysis} +%\label{sec:failure-analysis} +%%capture erreurs +%%fichiers +%%stdout, stderr +%%(only 4%) +%%android.jar en version 9 qui génère des erreurs +% +%During the running of our experiments we parse the standard output and error to capture: +% +%\begin{itemize} +% \item Java errors and stack traces +% \item Python errors and stack traces +% \item Ruby errors and stack traces +% \item Log4j messages with a ``ERROR'' or ``FATAL'' level +% \item XSB error messages +% \item Ocaml errors +%\end{itemize} +% +%% For example, Dialdroid reports in average \num{55.9} errors for one successful analysis. +%% On the contrary, some tools such as Blueseal report very few error at a time, making it easier to identify the cause of the failure. +% +%Because some tools send back a high number of errors in our logs (up to \num{46698} for one execution), we tried to determine the error that is linked to the failed status. +%Unfortunately, our manual investigations confirmed that the last error of a log output is not always the one that should be attributed to the global failure of the analysis. +%The error that seems to generate the failure can occur in the middle of the execution, be caught by the code and then other subsequent parts of the code may generate new errors as consequences of the first one. +%Similarly, the first error of a log is not always the cause of a failure. +%Sometimes errors successfully caught and handled are logged anyway. +%Thus, it is impossible to extract accurately the error responsible for a failed execution. +%Therefore, we investigated the nature of errors globally, without distinction between error messages in a log. +% +%\begin{figure*} +% \includegraphics[width=0.7\linewidth]{figs/repartition-of-error-types-among-tools.pdf} +% \caption{Heatmap of the ratio of errors reasons for all tools for the Rasta dataset} +% \label{fig:heatmap} +%\end{figure*} +% +%Figure~\ref{fig:heatmap} draws the most frequent error objects for each of the tools. +%A black square is an error type that represents more than 80\% of the errors raised by the considered tool. +%In between, gray squares show a ratio between 20\% and 80\% of the reported errors. +% +%First, the heatmap helps us to confirm that our experiments is running in adequate conditions. +%Regarding errors linked to memory, two errors should be investigated: \jv{OutOfMemoryError} and \jv{StackOverflowError}. +%The first one only appears for gator with a low ratio. Several tool have a low ratio of errors concerning the stack. +%These results confirm that the allocated heap and stack is sufficient for running the tools with the Rasta dataset. +%Regarding errors linked to the disk space, we observe few ratios for the exception \jv{IOException}, \jv{FileNotFoundError} and \jv{FileNotFoundException}. +%Manual inspections revealed that those errors are often a consequence of a failed apktool execution. +% +%Second, the black squares indicate frequent errors that need to be investigated separately. +%In the rest of this section, we manually analyzed, when possible, the code that generates this high ratio of errors and we give feedback about the possible causes and difficulties to write a bug fix. +% +% +% +%% Dialdroid: TODO +%% com.google.common.util.concurrent.ExecutionError -> memory error: java.lang.StackOverflowError, java.lang.OutOfMemoryError: Java heap space, java.lang.OutOfMemoryError: GC overhead limit exceeded +%% java.lang.RuntimeException: 'No call graph present in Scene. Maybe you want Whole Program mode (-w).', 'There were exceptions during IFDS analysis. Exiting.' 'Could not find method' +%% +%% +%% Didfail: DONE ? +%% java.lan.RuntimeException: "Could not find method" (1603), "not found: java.io.Serializable" (1362) ?, mostly originate from somewhere in soot +%% null pointer: mostly originate from somewhere in soot +%% File not found: error raised after a previous tool failed +%% +%% Gator: DONE +%% java.lang.RuntimeException: 'error: expected 1 element for annotation Deprecated. Got 1 instead.' (106 occ), misuse of `soot.dexpler.DexAnnotation.addAnnotation` ? as usual, buried under long list of call to soot, hard to pinpoint the cause. +%% java.lang.OutOfMemoryError: +%% java.io.IOException: No space left on device (169 occurences) +%% brut.androlib.AndrolibException: 198, various apktool, some ppb linked to java.io.IOException No space left on device +%% FileNotFoundError: ppb consequence of java.io.IOException: No such file or directory: '/tmp/gator-zxkd65ty/apktool.yml +%% +%% perfchecker: Done +%% java.lang.VerifyError: "Expecting a stackmap frame at branch target ", internet propose that it could be caused by Dexguard obfuscation +%% link error: probably problems with android.jar? +%% +%% redexer: +%% "File "src/ext/logging.ml", line 712, characters 12-17: Pattern matching failed": suspicious pattern matching but I don't know caml enough to debug. +%% +%% saaf: DONE +%% brut.androlib.AndrolibException: apktoool 1.5.2, "Could not decode arsc file" +%% de.rub.syssec.saaf.model.analysis.AnalysisException: encapsulate the apktool error +%% java.io.IOException: 'Expected: 0x001c0001, got: 0x00000000', still apktool +%% 38635 failures over the total of 38710 failures raise a 'brut.androlib.AndrolibException' apktool error. +%% +%% wognsen_et_al: +%% brut.androlib.AndrolibException: apktool 1.5.2, "Could not decode arsc file" +%% java.io.IOException: "Expected: 0x001c0001, got: 0x00000000|38598", apktool +%% java.lang.ArithmeticException: divide by zero, from apktool 'org.jf.dexlib.Code.Format.ArrayDataPseudoInstruction.getElementCount' +% +%% Amandroid: TODO +%% mainly java.lang.NullPointerException at org.argus.jawa.flow.pta.rfa.ReachingFactsAnalysis.process, line 68, don't speak scala well enought to understand what is null +% +% +%% Anadroid: DONE +%% subprocess.calledProcessError: subprocess.check_call([APK_TOOL, \"d\" , \"-f\", \"--no-src\", apk_fp, prj_d]) +%% java.io.IOException: somewhere in brut.androlib.res.decoder.ARSCDecoder.decode +%% brut.androlib.AndrolibException: raise by brut.androlib.res.decoder.ARSCDecoder.decode, somewhere in brut.apktool.Main.main +%% +%% main error msg for brut.androlib.AndrolibException is "Could not decode arsc file" +%% +%% Apktool v1.4.3, released December 8, 2011: two months after the parution of sdk 15 +%% min_sdk 9 to 13 ~50% of exec failled with "Could not decode arsc file", min_sdk 14 81%, 15 94%, >15 >=98%. +%% SELECT min_sdk, COUNT(*)*100/(SELECT COUNT(*) FROM apk AS apk2 WHERE apk2.min_sdk = apk.min_sdk) FROM error INNER JOIN apk ON error.sha256 = apk.sha256 WHERE tool_name = 'anadroid' AND msg='Could not decode arsc file' GROUP BY min_sdk ORDER BY min_sdk; +%% SELECT min_sdk, COUNT(*)*100/(SELECT COUNT(*) FROM apk AS apk2 WHERE apk2.min_sdk = apk.min_sdk) FROM exec INNER JOIN apk ON exec.sha256 = apk.sha256 WHERE tool_name = 'anadroid' AND tool_status = 'FAILED' GROUP BY min_sdk ORDER BY min_sdk; +%% SELECT AVG(cnt), MAX(cnt) FROM (SELECT COUNT(*) AS cnt FROM error WHERE tool_name = 'anadroid' AND msg='Could not decode arsc file' GROUP BY sha256); +% +%\paragraph{Androguard and Androguard\_dad} +% +%Surprisingly, while Androguard almost never fails to analyze an APK, the internal decompiler of Androguard (DAD) fails more than half of the time. +%The analysis of the logs shows that the issue comes from the way the decompiled methods are stored: each method is stored in a file named after the method name and signature, and this file name can quickly exceed the size limit (255 characters on most file systems). +%It should be noticed that Androguard\_dad rarely fails on the Drebin dataset. +%This illustrate the importance to test tools on real and up-to-date APKs: even a bad handling of filenames can influence an analysis. +% +%% Androguard: Done +%% 35 error total, no real pattern, stuff like unexpected ID, uncrowned instructions, ect +%% +%% Androguard Dad: DONE +%% All 33819 OSError are '[Errno 36] File name too long: ': the tool try to create files with the name AND SIGNATURE of the disassembled methods, by the file name can be too long: +%% '/mnt/out/in/android/vyapar/paymentgateway/model/PaymentGatewayResponseModel$Data$AccountDetails/PaymentGatewayResponseModel$Data$AccountDetails copy$default (PaymentGatewayResponseModel$Data$AccountDetails String String String String String String String String String String String String String String List I Object)PaymentGatewayResponseModel$Data$AccountDetails.ag' +%% NullPointerException +%% dad: SError +% +%\paragraph{Mallodroid and Apparecium} +% +%Mallodroid and Apparecium stand out as the tools that raised the most errors in one run. +%They can raise more than \num{10000} error by analysis. +%However, it happened only for a few dozen of APKs, and conspicuously, the same APKs raised the same hight number of errors for both tools. +%The recurring error is a {\tt KeyError} raise by Androguard when trying to find a string by its identifier. +%Although this error is logged, it seems successfully handled and during a manual analysis of the execution, both tools seemingly perform there analysis without issue. +%This hight number of occurrences may suggest that the output is not valid. +%Still, the tools claim to return a result, so, from our perspective, we consider those analysis as successful. +% +%For other numerous errors, we could not identify the reason why those specific applications raise so many exceptions. +%However we noticed that Mallodroid and Apparecium use outdated version of Androguard (respectively the version 3.0 and 2.0), and neither Androguard v3.3.5 nor DAD with Androguard v3.3.5 raise those exceptions. +%This suggest the issue has been fixed by Androguard and that Mallodroid and Apparecium could benefit from a dependency upgrade. +% +%% Apparecium: DONE +%% The KeyError is raised from androguard when a non existing string is queried. It happens for only a few apks (~60), +%% but a lot of times. UnicodeEncodeError happened more frequently (2740 apks), also originate from androguard. +%% androguard version 2.0 +%% +%% mallodroid: Done +%% KeyError: from androguard `get_raw_string`, but do not lead to crash, 33 crash from androguard parsing xml. (androguard 3.0) +%% Instruction10x% +% +%\paragraph{Blueseal} +% +%Because Blueseal rarely log more than one error when crashing, it is easy to identify the relevant error. The majority of crashes comes from unsupported Android versions (due to the magic number of the DEX files not being supported by the version of back smali used by Blueseal) and methods whose implementation are not found (like native methods). +%% Blueseal: Done +%% Majority of runtimes error: 'No method source set for method ' are raised from soot.SceneTransformer.transform() called by edu.buffalo.cse.blueseal.BSFlow.CgTransformer.getDynamicEntryPoints(). +%% No idea how to fix. Update soot? version unclear ('trunk'...), but copyright up to 2010 so 2010? +%% +%\paragraph{Droidsafe and SAAF} +% +%Our investigation of the most common errors raised by Droidsafe and SAAF showed that they are often preceded by an error from apktool. +%Indeed, \num{28654} runs of Droidsafe and \num{38635} runs of SAAF failed after raising at least one of {\tt brut.androlib.AndrolibException} or \\ {\tt brut.androlib.err.UndefinedResObject}, suggesting that those tools would benefit from an upgrade of apktool. +% +% +%% Droidsafe: +%% UnknownHostException: 'normal', due to network isolation(?), from sfl4j, no impact on the reste of the tool +%% droidsafe.utils.CannotFindMethodException: 'Cannot find or resolve ' (eg: android.view.ViewTreeObserver: void removeOnGlobalLayoutListener), +%% mostly related to android API. First guest 'normal' as droidsafe model the android API and is not updated since ~SDK 19, but the error is replaced by an +%% apktool error for min sdk > 19.: 2.0.0rc2 +%% eg: android.view.ViewTreeObserver.removeOnGlobalLayoutListener: exist un android.jar for sdk 18 and 18, but no in droidsafe model +%% the error does not look fatals (it occurred in finished execution) but is more common on failed execution. (1 to 16 ratio) +%% TODO: conclusion? +%% +%% 28957 apk with an apktool error +%% +%% CannotFindMethodException +% +%\paragraph{Ic3 and Ic3\_fork} +% +%% ic3: DONE +%% jas.jasError: "Missing arguments for instruction ldc" or "Badly formatted number", old soot or bad dare? +%% 3778 / 10480 (~30) fails without error logged, probable that we don't capture dare failures. +%% +%% ic3_fork: DONE +%% java.lang.RuntimeException: "This operation requires resolving level SIGNATURES but is at resolving level DANGLING", and "Could not find method". Yet another case of error lost in a sea of soot +%% only 38 failures without error logged +%% +%% IccTa: Done +%% java.lang.RuntimeException: same number of "This operation requires resolving level SIGNATURES..." as ic3_fork, +%% lots of "No method source set for method ", half the time this occurs the exec failed (and ~30% of the time it finishes) +%% "Could not find method": fail every time, in edu.psu.cse.siis.ic3.SetupApplication.calculateSourcesSinksEntrypoints (and again, a lot of soot stack) +% +%We compared the number of errors between Ic3 and Ic3\_fork. +%Ic3\_fork reports less errors for all types of analysis which suggests that the author of the fork have removed the outputed errors from the original code: the thrown errors are captured in a generic {\tt RuntimeException} which removes the semantic, making it harder our investigations. +%Nevertheless, Ic3\_fork has more failures than Ic3: the number of errors reported by a tool is not correlated to the final success of its analysis. +% +% +%% jasError +% +%\paragraph{Flowdroid} +% +%Our exchanges with the authors of Flowdroid led us to expect more timeouts from too long executions than failed run. +%Surprisingly we only got \mypercent[2]{37}{\NBTOTAL} of timeout, and a hight number of failures. +%We tried to detect recurring causes of failures, but the complexity of Flowdroid make the investigation difficult. +%Most exceptions seems to be related to concurrency. %or display a generic messages. +%Other errors that came up regularly are {\tt java.nio.channels.ClosedChannelException} which is raised when Flowdoid fails to read from the APK, although we did not find the reason of the failure, null pointer exceptions when trying to check if a null value is in a {\tt ConcurrentHashMap} (in {\tt LazySummaryProvider.getClassFlows()}) and {\tt StackOverflowError} from {\tt StronglyConnectedComponentsFast.recurse()}. +%We randomly selected 20 APKs that generated stack overflows in Flowdroid and retried the analysis with 500G of RAM allocated to the JVM. +%18 of those runs still failed with a stack overflow without using all the allocated memory, the other two failed after raising null pointer exceptions from {\tt getClassFlows}. +%This shows that the lack of memory is not the primary cause of those failures. +% +%% Flowdroid: TODO java.nio.channels.ClosedChannelException cause or consequence? +%% java.nio.channels.ClosedChannelException: mosly the zip file reader that refuse an access (after another crash? hard to check) +%% java.lang.StackOverflowError: +%% java.lang.RuntimeException: mostly "There were exceptions during IFDS analysis. Exiting." +%% java.lang.NullPointerException: soot.jimple.infoflow.collect.ConcurrentHashSet.contains, from soot.jimple.infoflow.methodSummary.data.provider.LazySummaryProvider.getClassFlows +%% com.google.common.util.concurrent.ExecutionError: "java.lang.StackOverflowError" +%% +% +%%No hidden timeout, what do we believe? avg(time) = 80s, 30s when finished, 137 when failed, max(time) = 3639s when failed, 3284 when finished, 72 \% of the failures took less than a minute, 93\% less than 10, 92\% of failed exception raised a NullPointerException. +% +%% Pauck: Flowdroid avg 2m on DIALDroid-Bench (real worlds apks) +% +% +%\medskip +% +%As a conclusion, we observe that a lot of errors can be linked to bugs in dependencies. +%Our attempts to upgrade those dependencies led to new errors appearing: we conclude that this is a no trivial task that require familiarity with the inner code of the tools. +% +%\subsection{State of the art comparison} +% +%% Luo {\it et al.} released TaintBench~\cite{luoTaintBenchAutomaticRealworld2022} a real-world benchmark and the associated recommendations to build such a benchmark. These benchmarks confirmed that some tools such as Amandroid and Flowdroid are less efficient on real-world applications. +%% Pauck {\it et al.}~\cite{pauckAndroidTaintAnalysis2018} +%% Reaves {\it et al.}~\cite{reaves_droid_2016} +% +%We finally compare our results to the conclusions and discussions of previous papers~\cite{luoTaintBenchAutomaticRealworld2022, pauckAndroidTaintAnalysis2018, reaves_droid_2016}. +% +%First we confirm the hypothesis of Luo {\it et al.} that real-world applications lead to less efficient analysis than using hand crafted test applications or old datasets~\cite{luoTaintBenchAutomaticRealworld2022}. Even if Drebin is not hand-crafted, it is quite old and we obtained really good results compared to the Rasta dataset. +%When considering real-world applications, the size is rather different from hand crafted application, which impacts the success rate. +%We believe that it is explained by the fact that the complexity of the code increases with its size. +% +%%30*6 +%%180 +%%21+20+27+2+18+18 +%%106 +%%106/180*100 +%%58.88 +% +%Second, our finding are consistent with the numerical results of Pauck {\it et al.} that showed that \mypercent{106}{180} of DIALDroid-Bench 30 real-world applications are analyzed successfully with the 6 evaluated tools~\cite{pauckAndroidTaintAnalysis2018}. +%Six years after the release of DIALDroid-Bench, we obtain a lower ratio of \mypercent{40.05}{100} for the same set of 6 tools but using the Rasta dataset of \NBTOTALSTRING applications. +%We extended this result to our set of \nbtoolsvariationsrun\xspace tools and obtained a global success rate of \resultratio. +%Our investigations of crashes also confirmed that dependencies to older versions of Apktool are impacting the performances of Anadroid, Saaf and Wognsen {\it et al.} in addition to DroidSafe and IccTa, already identified by Pauck {\it et al.}. +% +%% Pauck: 235 micro bench, 30 real* +%% Confirm didfail failled for min_sdk >= 19, all successful run (only 4%) indicated "Only phantom classes loaded, skipping analysis..." +% +%% SELECT tool_status, COUNT(*), AVG(dex_size) FROM exec INNER JOIN apk on exec.sha256 = apk.sha256 WHERE min_sdk >= 19 AND tool_name = 'didfail' GROUP BY tool_status; +%% FAILED|16651|13139071.2363221 +%% FINISHED|694|6617861.33717579 +%% TIMEOUT|98|6048999.2244898 +%% SELECT msg, COUNT(*) FROM (SELECT DISTINCT exec.sha256, msg FROM exec INNER JOIN apk on exec.sha256 = apk.sha256 INNER JOIN error ON exec.sha256 = error.sha256 AND exec.tool_name = error.tool_name WHERE min_sdk >= 19 AND exec.tool_name = 'didfail' AND exec.tool_status = 'FINISHED') GROUP BY msg; +%% |77 +%% Only phantom classes loaded, skipping analysis...|694 +%% +%% DroidSafe and IccTa Failled for SDK > 19 because of old apktool +%% +%% We obsered: (nb success < 2000 for min_skd >= 20) +%% ['anadroid', 'blueseal', 'dialdroid', 'didfail', 'droidsafe', 'ic3_fork', 'iccta', 'perfchecker', 'saaf', 'wognsen_et_al'] +%% anadroid|0 +%% blueseal|521 +%% dialdroid|812 +%% didfail|343 +%% droidsafe|35 +%% ic3_fork|1393 +%% iccta|612 +%% perfchecker|1921 +%% saaf|1588 +%% wognsen_et_al|386 +% +%Third, we extended to \nbtoolsselected\xspace different tools the work done by Reaves {\it et al.} on the usability of analysis tools (4 tools are in common, we added 16 new tools and two variations). +%We confirmed that most tools require a significant amount of work to get them running. +%We encounter similar issues with libraries and operating system incompatibilities, and noticed that, with time, dependencies issues may impact the build process. +%For instance we encountered cases where the repository hosting the dependencies were closed, or cases where maven failed to download dependencies because the OS version did not support SSL, now mandatory to access maven central. +%%, and even one case were the could not find anywhere the compiled version of sbt used to build a tool. + + diff --git a/3_rasta/5_conclusion.typ b/3_rasta/5_conclusion.typ new file mode 100644 index 0000000..4c64ab0 --- /dev/null +++ b/3_rasta/5_conclusion.typ @@ -0,0 +1,13 @@ +== Conclusion + +This paper has assessed the suggested results of the literature~\cite{luoTaintBenchAutomaticRealworld2022, pauckAndroidTaintAnalysis2018, reaves_droid_2016} about the reliability of static analysis tools for Android applications. +With a dataset of \NBTOTALSTRING applications we established that \resultunusable of \nbtoolsselectedvariations\xspace tools are not reusable, when considering that a tool that has more than 50\% of time a failure is unusable. +In total, the analysis success rate of the tools that we could run for the entire dataset is \resultratio. +The characteristics that have the most influence on the success rate is the bytecode size and min SDK version. Finally, we showed that malware APKs have a better finishing rate than goodware. + +In future works, we plan to investigate deeper the reported errors of the tools in order to analyze the most common types of errors, in particular for Java based tools. We also plan to extend this work with a selection of more recent tools performing static analysis. + +%Following Reaves {\it et al.} recommendations~\cite{reaves_droid_2016}, we publish the Docker and Singularity images we built to run our experiments alongside the Docker files. This will allow the research community to use directly the tools without the build and installation penalty. + +%\todo{check ce qui est dit sur ic3 et ic3fork} + diff --git a/3_rasta/X_lib.typ b/3_rasta/X_lib.typ new file mode 100644 index 0000000..d30e669 --- /dev/null +++ b/3_rasta/X_lib.typ @@ -0,0 +1,29 @@ + +#let mypercent(numerator, denominator, digits: 2) = { + [#calc.round((100 * numerator) / denominator, digits: digits) %] +} + +#let ok = text(fill: olive, sym.checkmark) +#let okk = text(fill: olive, tracking: -5pt, sym.checkmark+sym.checkmark) + +#let bad = text(fill: orange, sym.circle.stroked.small) +#let ko = text(fill: maroon, sym.crossmark) +#let nr = sym.dash.en + +#let str2sym(s) = { + if s == "ok" { + ok + } else if s == "okk" { + okk + } else if s == "bad" { + bad + } else if s == "ko" { + ko + } else if s == "nr" { + nr + } else { + s + } +} + +#let num(n) = [#n] diff --git a/3_rasta/X_var.typ b/3_rasta/X_var.typ new file mode 100644 index 0000000..82e5f2b --- /dev/null +++ b/3_rasta/X_var.typ @@ -0,0 +1,19 @@ +#import "X_lib.typ": mypercent + +#let NBTOTAL = 62525 +#let NBTOTALSTRING = NBTOTAL //\num{62525}\xspace} + +#let nbtools = 26 +#let nbtoolsselected = 20 +#let nbtoolsselectedvariations = 22 +#let nbtoolsvariationsrun = 20 +#let resultunusablenb = 12 //\xspace +#let resultunusable = mypercent(resultunusablenb, nbtoolsselectedvariations) // \xspace +#let resultratio = mypercent(54.9, 100) // \xspace +#let ramlimit = [64 GB] //\xspace + +#let rasta_tool_data = csv( + "data/data-final.csv", + delimiter: ";", + row-type: dictionary, +) diff --git a/3_rasta/data/data-final.csv b/3_rasta/data/data-final.csv new file mode 100644 index 0000000..9f653f1 --- /dev/null +++ b/3_rasta/data/data-final.csv @@ -0,0 +1,27 @@ +tool;citekey;binary;source;url;repo;documentation;decision;exclude;why;forkusable;authorconfirmed;lang;licences;os;origin;stars;alive;date;selected;selectedstars;selecteddate;nbaliveforks;remark;urlselected +A3E ;DBLPconfoopslaAzimN13;nr;ok;https://github.com/tanzirul/a3e;github;ok;ko;EXCLUDE;Hybrid tool (static/dynamic);ko;;;;;tanzirul/a3e;40;ko;2016-09-15;origin;40;2016-09-15;1;auto;https://github.com/tanzirul/a3e +A5 ;vidasA5AutomatedAnalysis2014;nr;ok;https://github.com/tvidas/a5;github;ko;ko;EXCLUDE;Hybrid tool (static/dynamic);ko;;;;;tvidas/a5;12;ko;2014-07-31;origin;12;2014-07-31;0;auto;https://github.com/tvidas/a5 +Adagio ;gasconStructuralDetectionAndroid2013;nr;ok;https://github.com/hgascon/adagio;github;ok;ok;;;ko;ok;Python;GPL 2.0;U20.04;hgascon/adagio;74;ok;2022-11-17;origin;74;2022-11-17;0;auto;https://github.com/hgascon/adagio +Amandroid ;weiAmandroidPreciseGeneral2014;ok;ok;https://github.com/arguslab/Argus-SAF;github;ok;ok;;;ko;ok;Scala;Apache 2.0;U22.04;arguslab/Argus-SAF;161;ko;2021-11-10;origin;161;2021-11-10;2;auto;https://github.com/arguslab/Argus-SAF +Anadroid ;liangSoundPreciseMalware2013;ko;ok;https://github.com/maggieddie/pushdownoo;github;ok;ok;;;ko;ko;Scala/Java/Python;CRAPL 2012;U22.04;maggieddie/pushdownoo;10;ko;2014-06-18;origin;10;2014-06-18;0;auto;https://github.com/maggieddie/pushdownoo +Androguard ;desnos:adnroguard:2011;nr;ok;https://github.com/androguard/androguard;github;okk;ok;;;ko;ko;Python;Apache 2.0;Python 3.11 slim;androguard/androguard;4430;ok;2023-02-01;origin;4430;2023-02-01;3;auto;https://github.com/androguard/androguard +Android-app-analysis;geneiatakisPermissionVerificationApproach2015;ko;ok;https://code.google.com/archive/p/android-app-analysis-tool/source/default/source;google;okk;ko;EXCLUDE;Hybrid tool (static/dynamic);;;;;;android-app-analysis-tool;40;ko;2014-06-25;origin;;2014-06-25;;Android-app;https://code.google.com/archive/p/android-app-analysis-tool/source/default/source +Apparecium ;titzeAppareciumRevealingData2015;ok;ok;https://github.com/askk/apparecium;github;ko;ok;;;ko;ko;Python;MIT;U22.04;askk/apparecium;0;ko;2014-11-07;origin;0;2014-11-07;1;auto;https://github.com/askk/apparecium +BlueSeal ;shenInformationFlowsPermission2014;ko;ok;https://github.com/ub-rms/blueseal;github;bad;ok;;;ko;ok;Java;No licence;U14.04;ub-rms/blueseal;0;ko;2018-07-04;origin;0;2018-07-04;0;auto;https://github.com/ub-rms/blueseal +Choi #etal ;CHOI2014620;ko;ok;https://github.com/kwanghoon/javaAnalysis;github;bad;ko;EXCLUDE;Works on source files only;ko;;;;;kwanghoon/JavaAnalysis;1;ok;2022-01-09;origin;1;2022-01-09;0;auto;https://github.com/kwanghoon/JavaAnalysis +DIALDroid ;bosuCollusiveDataLeak2017;ok;ok;https://github.com/dialdroid-android/DIALDroid;github;ok;ok;;;ko;ko;Java;GPL 3.0;U18.04;dialdroid-android/DIALDroid;16;ko;2018-04-17;origin;16;2018-04-17;1;auto;https://github.com/dialdroid-android/DIALDroid +DidFail ;klieberAndroidTaintFlow2014;ok;ok;https://bitbucket.org/wklieber/didfail/src/master/;bitbucket;bad;ok;;;;ok;Java/Python;3-Clause BSD;U12.04;lori\_flynn/didfail;4;ko;2015-06-17;origin;;2015-06-17;;DidFail;https://bitbucket.org/wklieber/didfail/src/master/ +DroidSafe ;DBLPconfndssGordonKPGNR15;ko;ok;https://github.com/MIT-PAC/droidsafe-src;github;ok;ok;;;ko;ok;Java/Python;GPL 2.0;U14.04;MIT-PAC/droidsafe-src;92;ko;2017-04-17;origin;92;2017-04-17;3;auto;https://github.com/MIT-PAC/droidsafe-src +Flowdroid ;Arzt2014a;ok;ok;https://github.com/secure-software-engineering/FlowDroid;github;okk;ok;;;ko;ok;Java;LGPL 2.1;U22.04;secure-software-engineering/FlowDroid;868;ok;2023-05-07;origin;868;2023-05-07;1;auto;https://github.com/secure-software-engineering/FlowDroid +Gator ;rountevStaticReferenceAnalysis2014,yangStaticControlFlowAnalysis2015;ko;ok;http://web.cse.ohio-state.edu/presto/software/gator/;edu;okk;ok;;;;ok;Java/Python;3-Clause BSD;U22.04;web;;;2019-09-09;origin;;2019-09-09;;Gator;http://web.cse.ohio-state.edu/presto/software/gator/ +IC3 ;octeauCompositeConstantPropagation2015;ok;ok;https://github.com/siis/ic3;github;bad;ok;;;ok;ko;Java;Apache 2.0;U12.04 / 22.04;siis/ic3;32;ko;2015-09-17;JordanSamhi/ic3;4;2022-12-06;3;auto;https://github.com/JordanSamhi/ic3 +IccTA ;liIccTADetectingInterComponent2015;ok;ok;https://github.com/lilicoding/soot-infoflow-android-iccta;github;ok;ok;;;ko;ok;Java;LGPL 2.1;U22.04;lilicoding/soot-infoflow-android-iccta;83;ko;2016-02-21;origin;83;2016-02-21;0;auto;https://github.com/lilicoding/soot-infoflow-android-iccta +Lotrack ;lillackTrackingLoadtimeConfiguration2014;ko;ok;https://github.com/MaxLillack/Lotrack;github;ko;bad;;Authors ack. a partial doc.;ko;ok;Java;Apache 2.0;?;MaxLillack/Lotrack;5;ko;2017-05-11;origin;5;2017-05-11;2;auto;https://github.com/MaxLillack/Lotrack +MalloDroid ;fahlWhyEveMallory2012;nr;ok;https://github.com/sfahl/mallodroid;github;ok;ok;;;ko;ko;Python;LGPL 3.0;U16.04;sfahl/mallodroid;64;ko;2013-12-30;origin;64;2013-12-30;10;auto;https://github.com/sfahl/mallodroid +PerfChecker ;liuCharacterizingDetectingPerformance2014;ko;ko;http://castle.cse.ust.hk/perfchecker/tool_obtain.php;request;bad;ok;;Binary obtained from authors;;ok;Java;Proprietary;U14.04;authors;;ko;--;origin;;--;;Perfchecker;??? +Poeplau #etal;DBLPconfndssPoeplauFBKV14; ko ;bad;https://github.com/sebastianpoeplau/android-whitelists;github;ko;ko;EXCLUDE;Related to Android hardening;ko;;;;;sebastianpoeplau/android-whitelists;1;ko;2014-03-14;origin;1;2014-03-14;0;auto;https://github.com/sebastianpoeplau/android-whitelists +Redexer ;jeonDrAndroidMr2012;ko;ok;https://github.com/plum-umd/redexer;github;ok;ok;;;ko;ok;Ocaml/Ruby;3-Clause BSD;U22.04;plum-umd/redexer;153;ko;2021-05-20;origin;153;2021-05-20;0;auto;https://github.com/plum-umd/redexer +SAAF ;hoffmannSlicingDroidsProgram2013;ok;ok;https://github.com/SAAF-Developers/saaf;github;ok;ok;;;ko;ok;Java;GPL 3.0;U14.04;SAAF-Developers/saaf;35;ko;2015-09-01;origin;35;2015-09-01;5;auto;https://github.com/SAAF-Developers/saaf +StaDynA ;zhauniarovichStaDynAAddressingProblem2015; ko ;ok;https://github.com/zyrikby/StaDynA;request;ok;ko;EXCLUDE;Hybrid tool (static/dynamic);;;;;;authors;;;2020-02-14;origin;;2020-02-14;;Stadyna;https://github.com/zyrikby/StaDynA +Thresher ;blackshearThresherPreciseRefutations2013;ko;ok;https://github.com/cuplv/thresher;github;ok;bad;;Not built with author’s help;ko;ok;Java;Apache 2.0;U14.04;cuplv/thresher;31;ko;2014-10-25;origin;31;2014-10-25;1;auto;https://github.com/cuplv/thresher +Wognsen #etal;wognsenFormalisationAnalysisDalvik2014;nr;ok;https://bitbucket.org/erw/dalvik-bytecode-analysis-tool/src/master/;bitbucket;ko;ok;;;ko;ko;Python/Prolog;No licence;U22.04;erw/dalvik-bytecode-analysis-tool;;;2022-06-27;origin;;2022-06-27;;Wognsen;??? diff --git a/3_rasta/figs/decorelation/finishing-rate-of-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg b/3_rasta/figs/decorelation/finishing-rate-of-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg new file mode 100644 index 0000000..4aa388e --- /dev/null +++ b/3_rasta/figs/decorelation/finishing-rate-of-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg @@ -0,0 +1,774 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/figs/decorelation/finishing-rate-of-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg b/3_rasta/figs/decorelation/finishing-rate-of-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg new file mode 100644 index 0000000..18625a9 --- /dev/null +++ b/3_rasta/figs/decorelation/finishing-rate-of-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg @@ -0,0 +1,782 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/figs/decorelation/finishing-rate-of-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg b/3_rasta/figs/decorelation/finishing-rate-of-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg new file mode 100644 index 0000000..b1d09a1 --- /dev/null +++ b/3_rasta/figs/decorelation/finishing-rate-of-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg @@ -0,0 +1,1087 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/figs/decorelation/finishing-rate-of-non-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg b/3_rasta/figs/decorelation/finishing-rate-of-non-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg new file mode 100644 index 0000000..89998d4 --- /dev/null +++ b/3_rasta/figs/decorelation/finishing-rate-of-non-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg @@ -0,0 +1,616 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/figs/decorelation/finishing-rate-of-non-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg b/3_rasta/figs/decorelation/finishing-rate-of-non-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg new file mode 100644 index 0000000..4aef518 --- /dev/null +++ b/3_rasta/figs/decorelation/finishing-rate-of-non-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg @@ -0,0 +1,574 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/figs/decorelation/finishing-rate-of-non-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg b/3_rasta/figs/decorelation/finishing-rate-of-non-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg new file mode 100644 index 0000000..b110510 --- /dev/null +++ b/3_rasta/figs/decorelation/finishing-rate-of-non-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg @@ -0,0 +1,872 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/figs/exit-status-for-the-drebin-dataset.svg b/3_rasta/figs/exit-status-for-the-drebin-dataset.svg new file mode 100644 index 0000000..fb4ab19 --- /dev/null +++ b/3_rasta/figs/exit-status-for-the-drebin-dataset.svg @@ -0,0 +1,2111 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/figs/exit-status-for-the-rasta-dataset-goodware-malware.svg b/3_rasta/figs/exit-status-for-the-rasta-dataset-goodware-malware.svg new file mode 100644 index 0000000..94828b8 --- /dev/null +++ b/3_rasta/figs/exit-status-for-the-rasta-dataset-goodware-malware.svg @@ -0,0 +1,3582 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/figs/exit-status-for-the-rasta-dataset.svg b/3_rasta/figs/exit-status-for-the-rasta-dataset.svg new file mode 100644 index 0000000..7684e8d --- /dev/null +++ b/3_rasta/figs/exit-status-for-the-rasta-dataset.svg @@ -0,0 +1,2140 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/figs/finishing-rate-by-year-of-java-based-tools.svg b/3_rasta/figs/finishing-rate-by-year-of-java-based-tools.svg new file mode 100644 index 0000000..c192b18 --- /dev/null +++ b/3_rasta/figs/finishing-rate-by-year-of-java-based-tools.svg @@ -0,0 +1,1005 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/figs/finishing-rate-by-year-of-non-java-based-tools.svg b/3_rasta/figs/finishing-rate-by-year-of-non-java-based-tools.svg new file mode 100644 index 0000000..094319c --- /dev/null +++ b/3_rasta/figs/finishing-rate-by-year-of-non-java-based-tools.svg @@ -0,0 +1,678 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/figs/running.svg b/3_rasta/figs/running.svg new file mode 100644 index 0000000..bb977fc --- /dev/null +++ b/3_rasta/figs/running.svg @@ -0,0 +1,225 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3_rasta/rasta.typ b/3_rasta/rasta.typ new file mode 100644 index 0000000..41a6fe8 --- /dev/null +++ b/3_rasta/rasta.typ @@ -0,0 +1,14 @@ +#import "@local/template-thesis-matisse:0.0.1": todo + += RASTA + +#todo[typstify RASTA paper] + +#todo[Format numbers] + +//#include("0_intro.typ") +#include("1_related_work.typ") +#include("2_methodology.typ") +#include("3_experiments.typ") +//#include("4_discussion.typ") +//#include("5_conclusion.typ") diff --git a/abstract.typ b/abstract.typ new file mode 100644 index 0000000..b4286a9 --- /dev/null +++ b/abstract.typ @@ -0,0 +1,9 @@ +#import "@local/template-thesis-matisse:0.0.1": todo + +#let keywords-en = ("Android", "Malware Analysis", todo[More Keywords]) +#let keywords-fr = ("Android", "Analyse de Maliciels") + + +#let abstract-en = lorem(175) + +#let abstract-fr = lorem(175) diff --git a/bibliography.bib b/bibliography.bib new file mode 100644 index 0000000..6ae67b7 --- /dev/null +++ b/bibliography.bib @@ -0,0 +1,682 @@ +@inproceedings{weiAmandroidPreciseGeneral2014, + title = {Amandroid: {{A Precise}} and {{General Inter-component Data Flow Analysis Framework}} for {{Security Vetting}} of {{Android Apps}}}, + shorttitle = {Amandroid}, + booktitle = {{{ACM SIGSAC Conference}} on {{Computer}} and {{Communications Security}}}, + author = {Wei, Fengguo and Roy, Sankardas and Ou, Xinming and {Robby}}, + year = {2014}, + month = nov, + pages = {1329--1341}, + publisher = {{ACM}}, + address = {{Scottsdale Arizona USA}}, + doi = {10.1145/2660267.2660357}, + urldate = {2024-01-25}, + isbn = {978-1-4503-2957-6}, + langid = {english} +} +@inproceedings{xiaEffectiveRealTimeAndroid2015, + title = {Effective {{Real-Time Android Application Auditing}}}, + booktitle = {2015 {{IEEE Symposium}} on {{Security}} and {{Privacy}}}, + author = {Xia, Mingyuan and Gong, Lu and Lyu, Yuanhao and Qi, Zhengwei and Liu, Xue}, + year = {2015}, + month = may, + pages = {899--914}, + publisher = {{IEEE}}, + address = {{San Jose, CA}}, + doi = {10.1109/SP.2015.60}, + isbn = {978-1-4673-6949-7}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/VTA4PNJJ/Xia et al. - 2015 - Effective Real-Time Android Application Auditing.pdf} +} + +@inproceedings{octeau2013effective, + title={Effective Inter-Component communication mapping in android: An essential step towards holistic security analysis}, + author={Octeau, Damien and McDaniel, Patrick and Jha, Somesh and Bartel, Alexandre and Bodden, Eric and Klein, Jacques and Le Traon, Yves}, + booktitle={22nd USENIX Security Symposium (USENIX Security 13)}, + pages={543--558}, + year={2013} +} + +@inproceedings{Enck2010, + title = {{{TaintDroid}}: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones}, + booktitle = {9th {{USENIX Symposium}} on {{Operating Systems Design}} and {{Implementation}}}, + author = {Enck, William and Gilbert, Peter and Chun, Byung-Gon and Cox, Landon P. and Jung, Jaeyeon and McDaniel, Patrick and Sheth, Anmol N.}, + year = {2010}, + month = oct, + pages = {393--407}, + publisher = {{USENIX Association}}, + address = {{Vancouver, BC, Canada}}, + isbn = {978-1-931971-79-9}, + keywords = {\ding{72},Dynamic analysis,Taint analysis}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/J8R79TUL/Enck et al. - 2010 - TaintDroid an information-flow tracking system for realtime privacy monitoring on smartphones.pdf} +} +@inproceedings{liApkCombinerCombiningMultiple2015, + title = {{{ApkCombiner}}: {{Combining Multiple Android Apps}} to {{Support Inter-App Analysis}}}, + shorttitle = {{{ApkCombiner}}}, + booktitle = {{{ICT Systems Security}} and {{Privacy Protection}}}, + author = {Li, Li and Bartel, Alexandre and Bissyand{\'e}, Tegawend{\'e} F. and Klein, Jacques and Traon, Yves Le}, + editor = {Federrath, Hannes and Gollmann, Dieter}, + year = {2015}, + volume = {455}, + pages = {513--527}, + publisher = {{Springer International Publishing}}, + address = {{Cham}}, + doi = {10.1007/978-3-319-18467-8_34}, + isbn = {978-3-319-18466-1 978-3-319-18467-8}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/DG5LXLJ8/Li et al. - 2015 - ApkCombiner Combining Multiple Android Apps to Su.pdf} +} +@inproceedings{allixAndroZooCollectingMillions2016, + title = {{{AndroZoo}}: {{Collecting Millions}} of {{Android Apps}} for the {{Research Community}}}, + shorttitle = {{{AndroZoo}}}, + booktitle = {13th {{Working Conference}} on {{Mining Software Repositories}} ({{MSR}})}, + author = {Allix, Kevin and Bissyand{\'e}, Tegawend{\'e} F. and Klein, Jacques and Traon, Yves Le}, + year = {2016}, + month = may, + pages = {468--471}, + abstract = {We present a growing collection of Android Applications col-lected from several sources, including the official GooglePlay app market. Our dataset, AndroZoo, currently contains more than three million apps, each of which has beenanalysed by tens of different AntiVirus products to knowwhich applications are detected as Malware. We provide thisdataset to contribute to ongoing research efforts, as well asto enable new potential research topics on Android Apps.By releasing our dataset to the research community, we alsoaim at encouraging our fellow researchers to engage in reproducible experiments.}, + keywords = {Android Applications,Androids,APK,Crawlers,Google,HTML,Humanoid robots,Protocols,Software,Software Repository}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/5SNISVTP/7832927.html} +} +@inproceedings{Arp2014, + title = {Drebin: {{Effective}} and {{Explainable Detection}} of {{Android Malware}} in {{Your Pocket}}}, + booktitle = {Network and {{Distributed System Security Symposium}}}, + author = {Arp, Daniel and Spreitzenbarth, Michael and Gascon, Hugo and Rieck, Konrad and Siemens, Germany and Munich, Cert}, + year = {2014}, + month = feb, + publisher = {{The Internet Society}}, + address = {{San Diego, California, USA}}, + isbn = {1-891562-35-5}, + keywords = {\ding{72},Static analysis}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/364XVWJK/Arp et al. - 2014 - Drebin Effective and Explainable Detection of And.pdf;/home/jf/snap/zotero-snap/common/Zotero/storage/ITE85DES/Arp et al. - 2014 - Drebin Effective and Explainable Detection of Android Malware in Your Pocket.pdf} +} +@article{Pendlebury2018, + title = {{{TESSERACT}}: {{Eliminating Experimental Bias}} in {{Malware Classification}} across {{Space}} and {{Time}}}, + author = {Pendlebury, Feargus and Pierazzi, Fabio and Jordaney, Roberto and Kinder, Johannes and Cavallaro, Lorenzo}, + year = {2018}, + eprint = {1807.07838}, + abstract = {Is Android malware classification a solved problem? Published F1 scores of up to 0.99 appear to leave very little room for improvement. In this paper, we argue that results are commonly inflated due to two pervasive sources of experimental bias: "spatial bias" caused by distributions of training and testing data that are not representative of a real-world deployment; and "temporal bias" caused by incorrect time splits of training and testing sets, leading to impossible configurations. We propose a set of space and time constraints for experiment design that eliminates both sources of bias. We introduce a new metric that summarizes the expected robustness of a classifier in a real-world setting, and we present an algorithm to tune its performance. Finally, we demonstrate how this allows us to evaluate mitigation strategies for time decay such as active learning. We have implemented our solutions in TESSERACT, an open source evaluation framework for comparing malware classifiers in a realistic setting. We used TESSERACT to evaluate three Android malware classifiers from the literature on a dataset of 129K applications spanning over three years. Our evaluation confirms that earlier published results are biased, while also revealing counter-intuitive performance and showing that appropriate tuning can lead to significant improvements.}, + archiveprefix = {arxiv}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/QXT9GLTX/Pendlebury et al. - 2018 - TESSERACT Eliminating Experimental Bias in Malware Classification across Space and Time.pdf} +} +@inproceedings{shanSelfhidingBehaviorAndroid2018, + title = {Self-Hiding Behavior in {{Android}} Apps}, + booktitle = {40th {{International Conference}} on {{Software Engineering}}}, + author = {Shan, Zhiyong and Neamtiu, Iulian and Samuel, Raina}, + year = {2018}, + pages = {728--739}, + publisher = {{ACM Press}}, + address = {{New York, New York, USA}}, + doi = {10.1145/3180155.3180214}, + isbn = {978-1-4503-5638-1}, + keywords = {Android,malware,mobile security,static analysis}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/FN53LJGG/Shan, Neamtiu, Samuel - 2018 - Self-hiding behavior in Android apps.pdf} +} +@article{DBLPjournalstifsMirandaGLTW22, + author = {Tom{\'{a}}s Concepci{\'{o}}n Miranda and + Pierre{-}Fran{\c{c}}ois Gimenez and + Jean{-}Fran{\c{c}}ois Lalande and + Val{\'{e}}rie Viet Triem Tong and + Pierre Wilke}, + title = {Debiasing Android Malware Datasets: How Can {I} Trust Your Results + If Your Dataset Is Biased?}, + journal = {{IEEE} Trans. Inf. Forensics Secur.}, + volume = {17}, + pages = {2182--2197}, + year = {2022}, + doi = {10.1109/TIFS.2022.3180184}, + timestamp = {Thu, 25 Aug 2022 08:35:58 +0200}, + biburl = {https://dblp.org/rec/journals/tifs/MirandaGLTW22.bib}, + bibsource = {dblp computer science bibliography, https://dblp.org} +} +@inproceedings{Allix, + title = {Are {{Your Training Datasets Yet Relevant}}?}, + booktitle = {Engineering {{Secure Software}} and {{Systems}}}, + author = {Allix, Kevin and Bissyand{\'e}, Tegawend{\'e} F. and Klein, Jacques and Le Traon, Yves}, + year = {2015}, + pages = {51--67}, + doi = {10.1007/978-3-319-15618-7_5}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/RG6PLSKG/Allix - Unknown - Are Your Training Datasets Yet Relevant.pdf} +} + +@inproceedings{pendlebury2019tesseract, + title={TESSERACT: Eliminating experimental bias in malware classification across space and time}, + author={Pendlebury, Feargus and Pierazzi, Fabio and Jordaney, Roberto and Kinder, Johannes and Cavallaro, Lorenzo and others}, + booktitle={Proceedings of the 28th USENIX Security Symposium}, + pages={729--746}, + year={2019}, + organization={USENIX Association} +} + +@online{statcounter, + author = {statcounter}, + title = {Operating System Market Share Worldwide}, + year = 2023, + url = {https://gs.statcounter.com/os-market-share#monthly-200901-202304}, + urldate = {2023-04-30} +} + +@online{statista, + author = {statista}, + title = {Operating System Market Share Worldwide}, + year = 2023, + url = {https://www.statista.com/statistics/266210/number-of-available-applications-in-the-google-play-store/}, + urldate = {2023-04-30} +} + +@inproceedings{Arzt2014a, + title = {{{FlowDroid}}: {{Precise Context}}, {{Flow}}, {{Field}}, {{Object-sensitive}} and {{Lifecycle-aware Taint Analysis}} for {{Android Apps}}}, + booktitle = {{{ACM SIGPLAN Conference}} on {{Programming Language Design}} and {{Implementation}}}, + author = {Arzt, Steven and Rasthofer, Siegfried and Fritz, Christian and Bodden, Eric and Bartel, Alexandre and Klein, Jacques and Le Traon, Yves and Octeau, Damien and McDaniel, Patrick}, + date = {2014-06-05}, + volume = {49}, + number = {6}, + pages = {259--269}, + publisher = {{ACM Press}}, + location = {{Edinburgh, UK}}, + issn = {03621340}, + doi = {10.1145/2666356.2594299}, + keywords = {Static analysis}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/XS8BH65X/Arzt et al. - 2014 - FlowDroid Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps.pdf} +} + +@article{blackshearThresherPreciseRefutations2013, + title = {Thresher: Precise Refutations for Heap Reachability}, + shorttitle = {Thresher}, + author = {Blackshear, Sam and Chang, Bor-Yuh Evan and Sridharan, Manu}, + date = {2013-06-23}, + journaltitle = {ACM SIGPLAN Notices}, + shortjournal = {SIGPLAN Not.}, + volume = {48}, + number = {6}, + pages = {275--286}, + issn = {0362-1340, 1558-1160}, + doi = {10.1145/2499370.2462186}, + urldate = {2023-02-11}, + abstract = {We present a precise, path-sensitive static analysis for reasoning about heap reachability, that is, whether an object can be reached from another variable or object via pointer dereferences. Precise reachability information is useful for a number of clients, including static detection of a class of Android memory leaks. For this client, we found the heap reachability information computed by a state-of-the-art points-to analysis was too imprecise, leading to numerous false-positive leak reports. Our analysis combines a symbolic execution capable of path-sensitivity and strong updates with abstract heap information computed by an initial flow-insensitive points-to analysis. This novel mixed representation allows us to achieve both precision and scalability by leveraging the pre-computed points-to facts to guide execution and prune infeasible paths. We have evaluated our techniques in the Thresher tool, which we used to find several developer-confirmed leaks in Android applications.}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/QZ9T3NC6/Blackshear et al. - 2013 - Thresher precise refutations for heap reachabilit.pdf} +} + +@article{CHOI2014620, + title = {A Type and Effect System for Activation Flow of Components in {{Android}} Programs}, + author = {Choi, Kwanghoon and Chang, Byeong-Mo}, + date = {2014}, + journaltitle = {Information Processing Letters}, + volume = {114}, + number = {11}, + pages = {620--627}, + issn = {0020-0190}, + doi = {10.1016/j.ipl.2014.05.011}, + abstract = {This paper proposes a type and effect system for analyzing activation flow between components through intents in Android programs. The activation flow information is necessary for all Android analyses such as a secure information flow analysis for Android programs. We first design a formal semantics for a core of featherweight Android/Java, which can address interaction between components through intents. Based on the formal semantics, we design a type and effect system for analyzing activation flow between components and demonstrate the soundness of the system.}, + keywords = {Android,Control flow,Formal semantics,Java,Program analysis}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/MF5DRVJP/Choi et Chang - 2014 - A type and effect system for activation flow of co.pdf} +} + +@inproceedings{DBLPconfndssGordonKPGNR15, + title = {Information Flow Analysis of Android Applications in {{DroidSafe}}}, + booktitle = {22nd Annual Network and Distributed System Security Symposium, {{NDSS}} 2015, San Diego, California, {{USA}}, February 8-11, 2015}, + author = {Gordon, Michael I. and Kim, Deokhwan and Perkins, Jeff H. and Gilham, Limei and Nguyen, Nguyen and Rinard, Martin C.}, + date = {2015}, + publisher = {{The Internet Society}}, + bibsource = {dblp computer science bibliography, https://dblp.org}, + biburl = {https://dblp.org/rec/conf/ndss/GordonKPGNR15.bib}, + timestamp = {Thu, 22 Dec 2022 16:51:59 +0100}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/6JGWR4R5/Gordon et al. - 2015 - Information flow analysis of android applications .pdf} +} + +@inproceedings{DBLPconfndssPoeplauFBKV14, + title = {Execute This! {{Analyzing}} Unsafe and Malicious Dynamic Code Loading in Android Applications}, + booktitle = {21st Annual Network and Distributed System Security Symposium, {{NDSS}} 2014, San Diego, California, {{USA}}, February 23-26, 2014}, + author = {Poeplau, Sebastian and Fratantonio, Yanick and Bianchi, Antonio and Kruegel, Christopher and Vigna, Giovanni}, + date = {2014}, + publisher = {{The Internet Society}}, + bibsource = {dblp computer science bibliography, https://dblp.org}, + biburl = {https://dblp.org/rec/conf/ndss/PoeplauFBKV14.bib}, + timestamp = {Mon, 01 Feb 2021 08:42:18 +0100}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/CQX3FINC/Poeplau et al. - 2014 - Execute this! Analyzing unsafe and malicious dynam.pdf} +} + +@inproceedings{DBLPconfoopslaAzimN13, + title = {Targeted and Depth-First Exploration for Systematic Testing of Android Apps}, + booktitle = {Proceedings of the 2013 {{ACM SIGPLAN}} International Conference on Object Oriented Programming Systems Languages \& Applications, {{OOPSLA}} 2013, Part of {{SPLASH}} 2013, Indianapolis, {{IN}}, {{USA}}, October 26-31, 2013}, + author = {Azim, Tanzirul and Neamtiu, Iulian}, + editor = {Hosking, Antony L. and Eugster, Patrick Th. and Lopes, Cristina V.}, + date = {2013}, + pages = {641--660}, + publisher = {{ACM}}, + doi = {10.1145/2509136.2509549}, + bibsource = {dblp computer science bibliography, https://dblp.org}, + biburl = {https://dblp.org/rec/conf/oopsla/AzimN13.bib}, + timestamp = {Thu, 24 Jun 2021 16:19:30 +0200}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/MVEBFDE8/Azim et Neamtiu - 2013 - Targeted and depth-first exploration for systemati.pdf} +} + +@inproceedings{fahlWhyEveMallory2012, + title = {Why Eve and Mallory Love Android: An Analysis of Android {{SSL}} (in)Security}, + shorttitle = {Why Eve and Mallory Love Android}, + booktitle = {Proceedings of the 2012 {{ACM}} Conference on {{Computer}} and Communications Security}, + author = {Fahl, Sascha and Harbach, Marian and Muders, Thomas and Baumgärtner, Lars and Freisleben, Bernd and Smith, Matthew}, + date = {2012-10-16}, + pages = {50--61}, + publisher = {{ACM}}, + location = {{Raleigh North Carolina USA}}, + doi = {10.1145/2382196.2382205}, + urldate = {2023-02-11}, + eventtitle = {{{CCS}}'12: The {{ACM Conference}} on {{Computer}} and {{Communications Security}}}, + isbn = {978-1-4503-1651-4}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/J3FSBFJ7/Fahl et al. - 2012 - Why eve and mallory love android an analysis of a.pdf} +} + +@inproceedings{gasconStructuralDetectionAndroid2013, + title = {Structural Detection of Android Malware Using Embedded Call Graphs}, + booktitle = {Proceedings of the 2013 {{ACM}} Workshop on {{Artificial}} Intelligence and Security}, + author = {Gascon, Hugo and Yamaguchi, Fabian and Arp, Daniel and Rieck, Konrad}, + date = {2013-11-04}, + pages = {45--54}, + publisher = {{ACM}}, + location = {{Berlin Germany}}, + doi = {10.1145/2517312.2517315}, + urldate = {2023-02-11}, + eventtitle = {{{CCS}}'13: 2013 {{ACM SIGSAC Conference}} on {{Computer}} and {{Communications Security}}}, + isbn = {978-1-4503-2488-5}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/9LF4FR8Y/2517312.2517315.pdf;/home/jf/snap/zotero-snap/common/Zotero/storage/YYVYSARX/Gascon et al. - 2013 - Structural detection of android malware using embe.pdf} +} + +@article{geneiatakisPermissionVerificationApproach2015, + title = {A {{Permission}} Verification Approach for Android Mobile Applications}, + author = {Geneiatakis, Dimitris and Fovino, Igor Nai and Kounelis, Ioannis and Stirparo, Pasquale}, + date = {2015-03}, + journaltitle = {Computers \& Security}, + shortjournal = {Computers \& Security}, + volume = {49}, + pages = {192--205}, + issn = {01674048}, + doi = {10.1016/j.cose.2014.10.005}, + urldate = {2023-02-11}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/ENIVR8EY/Geneiatakis et al. - 2015 - A Permission verification approach for android mob.pdf} +} + +@inproceedings{hoffmannSlicingDroidsProgram2013, + title = {Slicing Droids: Program Slicing for Smali Code}, + shorttitle = {Slicing Droids}, + booktitle = {Proceedings of the 28th {{Annual ACM Symposium}} on {{Applied Computing}}}, + author = {Hoffmann, Johannes and Ussath, Martin and Holz, Thorsten and Spreitzenbarth, Michael}, + date = {2013-03-18}, + series = {{{SAC}} '13}, + pages = {1844--1851}, + publisher = {{Association for Computing Machinery}}, + location = {{New York, NY, USA}}, + doi = {10.1145/2480362.2480706}, + urldate = {2022-10-26}, + abstract = {The popularity of mobile devices like smartphones and tablets has increased significantly in the last few years with many millions of sold devices. This growth also has its drawbacks: attackers have realized that smartphones are an attractive target and in the last months many different kinds of malicious software (short: malware) for such devices have emerged. This worrisome development has the potential to hamper the prospering ecosystem of mobile devices and the potential for damage is huge. Considering these aspects, it is evident that malicious apps need to be detected early on in order to prevent further distribution and infections. This implies that it is necessary to develop techniques capable of detecting malicious apps in an automated way. In this paper, we present SAAF, a Static Android Analysis Framework for Android apps. SAAF analyzes smali code, a disassembled version of the DEX format used by Android's Java VM implementation. Our goal is to create program slices in order to perform data-flow analyses to backtrack parameters used by a given method. This helps us to identify suspicious code regions in an automated way. Several other analysis techniques such as visualization of control flow graphs or identification of ad-related code are also implemented in SAAF. In this paper, we report on program slicing for Android and present results obtained by using this technique to analyze more than 136,000 benign and about 6,100 malicious apps.}, + isbn = {978-1-4503-1656-9}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/XC3Z9ELA/Hoffmann et al. - 2013 - Slicing droids program slicing for smali code.pdf} +} + +@inproceedings{jeonDrAndroidMr2012, + title = {Dr. {{Android}} and {{Mr}}. {{Hide}}: Fine-Grained Permissions in Android Applications}, + shorttitle = {Dr. {{Android}} and {{Mr}}. {{Hide}}}, + booktitle = {Proceedings of the Second {{ACM}} Workshop on {{Security}} and Privacy in Smartphones and Mobile Devices}, + author = {Jeon, Jinseong and Micinski, Kristopher K. and Vaughan, Jeffrey A. and Fogel, Ari and Reddy, Nikhilesh and Foster, Jeffrey S. and Millstein, Todd}, + date = {2012-10-19}, + pages = {3--14}, + publisher = {{ACM}}, + location = {{Raleigh North Carolina USA}}, + doi = {10.1145/2381934.2381938}, + urldate = {2023-02-10}, + eventtitle = {{{CCS}}'12: The {{ACM Conference}} on {{Computer}} and {{Communications Security}}}, + isbn = {978-1-4503-1666-8}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/99J6WNGV/Jeon et al. - 2012 - Dr. Android and Mr. Hide fine-grained permissions.pdf} +} + +@inproceedings{klieberAndroidTaintFlow2014, + title = {Android Taint Flow Analysis for App Sets}, + booktitle = {Proceedings of the 3rd {{ACM SIGPLAN International Workshop}} on the {{State}} of the {{Art}} in {{Java Program Analysis}}}, + author = {Klieber, William and Flynn, Lori and Bhosale, Amar and Jia, Limin and Bauer, Lujo}, + date = {2014-06-12}, + pages = {1--6}, + publisher = {{ACM}}, + location = {{Edinburgh United Kingdom}}, + doi = {10.1145/2614628.2614633}, + urldate = {2023-02-10}, + eventtitle = {{{PLDI}} '14: {{ACM SIGPLAN Conference}} on {{Programming Language Design}} and {{Implementation}}}, + isbn = {978-1-4503-2919-4}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/8X6YV3IE/2614628.2614633.pdf;/home/jf/snap/zotero-snap/common/Zotero/storage/9DBAXR49/Klieber et al. - 2014 - Android taint flow analysis for app sets.pdf} +} + +@inproceedings{liangSoundPreciseMalware2013, + title = {Sound and Precise Malware Analysis for Android via Pushdown Reachability and Entry-Point Saturation}, + booktitle = {Proceedings of the {{Third ACM}} Workshop on {{Security}} and Privacy in Smartphones \& Mobile Devices}, + author = {Liang, Shuying and Keep, Andrew W. and Might, Matthew and Lyde, Steven and Gilray, Thomas and Aldous, Petey and Van Horn, David}, + date = {2013-11-08}, + series = {{{SPSM}} '13}, + pages = {21--32}, + publisher = {{Association for Computing Machinery}}, + location = {{New York, NY, USA}}, + doi = {10.1145/2516760.2516769}, + urldate = {2023-02-08}, + abstract = {Sound malware analysis of Android applications is challenging. First, object-oriented programs exhibit highly interprocedural, dynamically dispatched control structure. Second, the Android programming paradigm relies heavily on the asynchronous execution of multiple entry points. Existing analysis techniques focus more on the second challenge, while relying on traditional analytic techniques that suffer from inherent imprecision or unsoundness to solve the first. We present Anadroid, a static malware analysis framework for Android apps. Anadroid exploits two techniques to soundly raise precision: (1) it uses a pushdown system to precisely model dynamically dispatched interprocedural and exception-driven control-flow; (2) it uses Entry-Point Saturation (EPS) to soundly approximate all possible interleavings of asynchronous entry points in Android applications. (It also integrates static taint-flow analysis and least permissions analysis to expand the class of malicious behaviors which it can catch.) Anadroid provides rich user interface support for human analysts which must ultimately rule on the "maliciousness" of a behavior. To demonstrate the effectiveness of Anadroid's malware analysis, we had teams of analysts analyze a challenge suite of 52 Android applications released as part of the Automated Program Analysis for Cybersecurity (APAC) DARPA program. The first team analyzed the apps using a version of Anadroid that uses traditional (finite-state-machine-based) control-flow-analysis found in existing malware analysis tools; the second team analyzed the apps using a version of Anadroid that uses our enhanced pushdown-based control-flow-analysis. We measured machine analysis time, human analyst time, and their accuracy in flagging malicious applications. With pushdown analysis, we found statistically significant (p {$<$} 0.05) decreases in time: from 85 minutes per app to 35 minutes per app in human plus machine analysis time; and statistically significant (p {$<$} 0.05) increases in accuracy with the pushdown-driven analyzer: from 71\% correct identification to 95\% correct identification.}, + isbn = {978-1-4503-2491-5}, + keywords = {abstract interpretation,malware detection,pushdown systems,static analysis,taint analysis}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/QKCQ4LWI/Liang et al. - 2013 - Sound and precise malware analysis for android via.pdf} +} + +@inproceedings{liIccTADetectingInterComponent2015, + title = {{{IccTA}}: {{Detecting Inter-Component Privacy Leaks}} in {{Android Apps}}}, + shorttitle = {{{IccTA}}}, + booktitle = {2015 {{IEEE}}/{{ACM}} 37th {{IEEE International Conference}} on {{Software Engineering}}}, + author = {Li, Li and Bartel, Alexandre and Bissyande, Tegawende F. and Klein, Jacques and Le Traon, Yves and Arzt, Steven and Rasthofer, Siegfried and Bodden, Eric and Octeau, Damien and McDaniel, Patrick}, + date = {2015-05}, + pages = {280--291}, + publisher = {{IEEE}}, + location = {{Florence, Italy}}, + doi = {10.1109/ICSE.2015.48}, + url = {http://ieeexplore.ieee.org/document/7194581/}, + urldate = {2023-02-11}, + eventtitle = {2015 {{IEEE}}/{{ACM}} 37th {{IEEE International Conference}} on {{Software Engineering}} ({{ICSE}})}, + isbn = {978-1-4799-1934-5}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/8HDRKSA2/IccTA_Detecting_Inter-Component_Privacy_Leaks_in_Android_Apps.pdf;/home/jf/snap/zotero-snap/common/Zotero/storage/K749QIGK/Li et al. - 2015 - IccTA Detecting Inter-Component Privacy Leaks in .pdf} +} + +@inproceedings{lillackTrackingLoadtimeConfiguration2014, + title = {Tracking Load-Time Configuration Options}, + booktitle = {Proceedings of the 29th {{ACM}}/{{IEEE International Conference}} on {{Automated Software Engineering}}}, + author = {Lillack, Max and Kästner, Christian and Bodden, Eric}, + date = {2014-09-15}, + series = {{{ASE}} '14}, + pages = {445--456}, + publisher = {{Association for Computing Machinery}}, + location = {{New York, NY, USA}}, + doi = {10.1145/2642937.2643001}, + url = {https://doi.org/10.1145/2642937.2643001}, + urldate = {2023-02-08}, + abstract = {Highly-configurable software systems are pervasive, although configuration options and their interactions raise complexity of the program and increase maintenance effort. Especially load-time configuration options, such as parameters from command-line options or configuration files, are used with standard programming constructs such as variables and if statements intermixed with the program's implementation; manually tracking configuration options from the time they are loaded to the point where they may influence control-flow decisions is tedious and error prone. We design and implement Lotrack, an extended static taint analysis to automatically track configuration options. Lotrack derives a configuration map that explains for each code fragment under which configurations it may be executed. An evaluation on Android applications shows that Lotrack yields high accuracy with reasonable performance. We use Lotrack to empirically characterize how much of the implementation of Android apps depends on the platform's configuration options or interactions of these options.}, + isbn = {978-1-4503-3013-8}, + keywords = {configuration options,static analysis,variability mining}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/3BNMD58Z/Lillack et al. - 2014 - Tracking load-time configuration options.pdf} +} + +@inproceedings{liuCharacterizingDetectingPerformance2014, + title = {Characterizing and Detecting Performance Bugs for Smartphone Applications}, + booktitle = {Proceedings of the 36th {{International Conference}} on {{Software Engineering}}}, + author = {Liu, Yepang and Xu, Chang and Cheung, Shing-Chi}, + date = {2014-05-31}, + pages = {1013--1024}, + publisher = {{ACM}}, + location = {{Hyderabad India}}, + doi = {10.1145/2568225.2568229}, + url = {https://dl.acm.org/doi/10.1145/2568225.2568229}, + urldate = {2023-02-11}, + eventtitle = {{{ICSE}} '14: 36th {{International Conference}} on {{Software Engineering}}}, + isbn = {978-1-4503-2756-5}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/8JE5EF72/Liu et al. - 2014 - Characterizing and detecting performance bugs for .pdf} +} + +@inproceedings{octeauCompositeConstantPropagation2015, + title = {Composite {{Constant Propagation}}: {{Application}} to {{Android Inter-Component Communication Analysis}}}, + shorttitle = {Composite {{Constant Propagation}}}, + booktitle = {2015 {{IEEE}}/{{ACM}} 37th {{IEEE International Conference}} on {{Software Engineering}}}, + author = {Octeau, Damien and Luchaup, Daniel and Dering, Matthew and Jha, Somesh and McDaniel, Patrick}, + date = {2015-05}, + pages = {77--88}, + publisher = {{IEEE}}, + location = {{Florence, Italy}}, + doi = {10.1109/ICSE.2015.30}, + url = {http://ieeexplore.ieee.org/document/7194563/}, + urldate = {2023-02-11}, + eventtitle = {2015 {{IEEE}}/{{ACM}} 37th {{IEEE International Conference}} on {{Software Engineering}} ({{ICSE}})}, + isbn = {978-1-4799-1934-5}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/INM9WAVU/Octeau et al. - 2015 - Composite Constant Propagation Application to And.pdf} +} + +@inproceedings{rountevStaticReferenceAnalysis2014, + title = {Static {{Reference Analysis}} for {{GUI Objects}} in {{Android Software}}}, + booktitle = {Proceedings of {{Annual IEEE}}/{{ACM International Symposium}} on {{Code Generation}} and {{Optimization}}}, + author = {Rountev, Atanas and Yan, Dacong}, + date = {2014-02-15}, + pages = {143--153}, + publisher = {{ACM}}, + location = {{Orlando FL USA}}, + doi = {10.1145/2544137.2544159}, + url = {https://dl.acm.org/doi/10.1145/2544137.2544159}, + urldate = {2023-02-11}, + eventtitle = {{{CGO}} '14: 12th {{Annual IEEE}}/{{ACM International Symposium}} on {{Code Generation}} and {{Optimization}}}, + isbn = {978-1-4503-2670-4}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/QWSPKRZ4/Rountev et Yan - 2014 - Static Reference Analysis for GUI Objects in Andro.pdf} +} + +@inproceedings{shenInformationFlowsPermission2014, + title = {Information Flows as a Permission Mechanism}, + booktitle = {Proceedings of the 29th {{ACM}}/{{IEEE International Conference}} on {{Automated Software Engineering}}}, + author = {Shen, Feng and Vishnubhotla, Namita and Todarka, Chirag and Arora, Mohit and Dhandapani, Babu and Lehner, Eric John and Ko, Steven Y. and Ziarek, Lukasz}, + date = {2014-09-15}, + pages = {515--526}, + publisher = {{ACM}}, + location = {{Vasteras Sweden}}, + doi = {10.1145/2642937.2643018}, + url = {https://dl.acm.org/doi/10.1145/2642937.2643018}, + urldate = {2023-02-11}, + eventtitle = {{{ASE}} '14: {{ACM}}/{{IEEE International Conference}} on {{Automated Software Engineering}}}, + isbn = {978-1-4503-3013-8}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/ZQSXYZNX/Shen et al. - 2014 - Information flows as a permission mechanism.pdf} +} + +@inproceedings{titzeAppareciumRevealingData2015, + title = {Apparecium: {{Revealing Data Flows}} in {{Android Applications}}}, + shorttitle = {Apparecium}, + booktitle = {2015 {{IEEE}} 29th {{International Conference}} on {{Advanced Information Networking}} and {{Applications}}}, + author = {Titze, Dennis and Schutte, Julian}, + date = {2015-03}, + pages = {579--586}, + publisher = {{IEEE}}, + location = {{Gwangiu, South Korea}}, + doi = {10.1109/AINA.2015.239}, + url = {http://ieeexplore.ieee.org/document/7098024/}, + urldate = {2023-02-11}, + eventtitle = {2015 {{IEEE}} 29th {{International Conference}} on {{Advanced Information Networking}} and {{Applications}} ({{AINA}})}, + isbn = {978-1-4799-7905-9}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/T6I4SND6/Titze et Schutte - 2015 - Apparecium Revealing Data Flows in Android Applic.pdf} +} + +@inproceedings{vidasA5AutomatedAnalysis2014, + title = {A5: {{Automated Analysis}} of {{Adversarial Android Applications}}}, + shorttitle = {A5}, + booktitle = {Proceedings of the 4th {{ACM Workshop}} on {{Security}} and {{Privacy}} in {{Smartphones}} \& {{Mobile Devices}}}, + author = {Vidas, Timothy and Tan, Jiaqi and Nahata, Jay and Tan, Chaur Lih and Christin, Nicolas and Tague, Patrick}, + date = {2014-11-07}, + pages = {39--50}, + publisher = {{ACM}}, + location = {{Scottsdale Arizona USA}}, + doi = {10.1145/2666620.2666630}, + url = {https://dl.acm.org/doi/10.1145/2666620.2666630}, + urldate = {2023-02-11}, + eventtitle = {{{CCS}}'14: 2014 {{ACM SIGSAC Conference}} on {{Computer}} and {{Communications Security}}}, + isbn = {978-1-4503-3155-5}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/CPKK7RNR/2666620.2666630.pdf;/home/jf/snap/zotero-snap/common/Zotero/storage/LJCIRR3J/Vidas et al. - 2014 - A5 Automated Analysis of Adversarial Android Appl.pdf} +} + +@article{weiAmandroidPreciseGeneral2018, + title = {Amandroid: {{A Precise}} and {{General Inter-component Data Flow Analysis Framework}} for {{Security Vetting}} of {{Android Apps}}}, + shorttitle = {Amandroid}, + author = {Wei, Fengguo and Roy, Sankardas and Ou, Xinming and {Robby}}, + date = {2018-08-31}, + journaltitle = {ACM Transactions on Privacy and Security}, + shortjournal = {ACM Trans. Priv. Secur.}, + volume = {21}, + number = {3}, + pages = {1--32}, + issn = {2471-2566, 2471-2574}, + doi = {10.1145/3183575}, + url = {https://dl.acm.org/doi/10.1145/3183575}, + urldate = {2023-02-11}, + abstract = {We present a new approach to static analysis for security vetting of Android apps and a general framework called Amandroid. Amandroid determines points-to information for all objects in an Android app component in a flow and context-sensitive (user-configurable) way and performs data flow and data dependence analysis for the component. Amandroid also tracks inter-component communication activities. It can stitch the component-level information into the app-level information to perform intra-app or inter-app analysis. In this article, (a) we show that the aforementioned type of comprehensive app analysis is completely feasible in terms of computing resources with modern hardware, (b) we demonstrate that one can easily leverage the results from this general analysis to build various types of specialized security analyses—in many cases the amount of additional coding needed is around 100 lines of code, and (c) the result of those specialized analyses leveraging Amandroid is at least on par and often exceeds prior works designed for the specific problems, which we demonstrate by comparing Amandroid’s results with those of prior works whenever we can obtain the executable of those tools. Since Amandroid’s analysis directly handles inter-component control and data flows, it can be used to address security problems that result from interactions among multiple components from either the same or different apps. Amandroid’s analysis is sound in that it can provide assurance of the absence of the specified security problems in an app with well-specified and reasonable assumptions on Android runtime system and its library.}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/5IDHRP5H/Wei et al. - 2018 - Amandroid A Precise and General Inter-component D.pdf} +} + +@article{wognsenFormalisationAnalysisDalvik2014, + title = {Formalisation and Analysis of {{Dalvik}} Bytecode}, + author = {Wognsen, Erik Ramsgaard and Karlsen, Henrik Søndberg and Olesen, Mads Chr. and Hansen, René Rydhof}, + date = {2014-10}, + journaltitle = {Science of Computer Programming}, + shortjournal = {Science of Computer Programming}, + volume = {92}, + pages = {25--55}, + issn = {01676423}, + doi = {10.1016/j.scico.2013.11.037}, + url = {https://linkinghub.elsevier.com/retrieve/pii/S0167642313003304}, + urldate = {2023-02-11}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/69DQRABJ/Wognsen et al. - 2014 - Formalisation and analysis of Dalvik bytecode.pdf;/home/jf/snap/zotero-snap/common/Zotero/storage/X9LQ5YCI/1-s2.0-S0167642313003304-main.pdf} +} + +@inproceedings{yangStaticControlFlowAnalysis2015, + title = {Static {{Control-Flow Analysis}} of {{User-Driven Callbacks}} in {{Android Applications}}}, + booktitle = {2015 {{IEEE}}/{{ACM}} 37th {{IEEE International Conference}} on {{Software Engineering}}}, + author = {Yang, Shengqian and Yan, Dacong and Wu, Haowei and Wang, Yan and Rountev, Atanas}, + date = {2015-05}, + pages = {89--99}, + publisher = {{IEEE}}, + location = {{Florence, Italy}}, + doi = {10.1109/ICSE.2015.31}, + url = {http://ieeexplore.ieee.org/document/7194564/}, + urldate = {2023-02-11}, + eventtitle = {2015 {{IEEE}}/{{ACM}} 37th {{IEEE International Conference}} on {{Software Engineering}} ({{ICSE}})}, + isbn = {978-1-4799-1934-5}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/LH7HE28Q/Yang et al. - 2015 - Static Control-Flow Analysis of User-Driven Callba.pdf} +} + +@inproceedings{zhauniarovichStaDynAAddressingProblem2015, + title = {{{StaDynA}}: {{Addressing}} the {{Problem}} of {{Dynamic Code Updates}} in the {{Security Analysis}} of {{Android Applications}}}, + shorttitle = {{{StaDynA}}}, + booktitle = {Proceedings of the 5th {{ACM Conference}} on {{Data}} and {{Application Security}} and {{Privacy}}}, + author = {Zhauniarovich, Yury and Ahmad, Maqsood and Gadyatskaya, Olga and Crispo, Bruno and Massacci, Fabio}, + date = {2015-03-02}, + pages = {37--48}, + publisher = {{ACM}}, + location = {{San Antonio Texas USA}}, + doi = {10.1145/2699026.2699105}, + url = {https://dl.acm.org/doi/10.1145/2699026.2699105}, + urldate = {2023-02-11}, + eventtitle = {{{CODASPY}}'15: {{Fifth ACM Conference}} on {{Data}} and {{Application Security}} and {{Privacy}}}, + isbn = {978-1-4503-3191-3}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/Z9BCFAJY/Zhauniarovich et al. - 2015 - StaDynA Addressing the Problem of Dynamic Code Up.pdf} +} +@article{Li2017, + title = {Static Analysis of Android Apps: {{A}} Systematic Literature Review}, + author = {Li, Li and Bissyandé, Tegawendé F. and Papadakis, Mike and Rasthofer, Siegfried and Bartel, Alexandre and Octeau, Damien and Klein, Jacques and Le Traon, Yves}, + date = {2017}, + journaltitle = {Information and Software Technology}, + volume = {88}, + pages = {67--95}, + issn = {09505849}, + doi = {10.1016/j.infsof.2017.04.001}, + abstract = {Context Static analysis exploits techniques that parse program source code or bytecode, often traversing program paths to check some program properties. Static analysis approaches have been proposed for different tasks, including for assessing the security of Android apps, detecting app clones, automating test cases generation, or for uncovering non-functional issues related to performance or energy. The literature thus has proposed a large body of works, each of which attempts to tackle one or more of the several challenges that program analyzers face when dealing with Android apps. Objective We aim to provide a clear view of the state-of-the-art works that statically analyze Android apps, from which we highlight the trends of static analysis approaches, pinpoint where the focus has been put, and enumerate the key aspects where future researches are still needed. Method We have performed a systematic literature review (SLR) which involves studying 124 research papers published in software engineering, programming languages and security venues in the last 5 years (January 2011–December 2015). This review is performed mainly in five dimensions: problems targeted by the approach, fundamental techniques used by authors, static analysis sensitivities considered, android characteristics taken into account and the scale of evaluation performed. Results Our in-depth examination has led to several key findings: 1) Static analysis is largely performed to uncover security and privacy issues; 2) The Soot framework and the Jimple intermediate representation are the most adopted basic support tool and format, respectively; 3) Taint analysis remains the most applied technique in research approaches; 4) Most approaches support several analysis sensitivities, but very few approaches consider path-sensitivity; 5) There is no single work that has been proposed to tackle all challenges of static analysis that are related to Android programming; and 6) Only a small portion of state-of-the-art works have made their artifacts publicly available. Conclusion The research community is still facing a number of challenges for building approaches that are aware altogether of implicit-Flows, dynamic code loading features, reflective calls, native code and multi-threading, in order to implement sound and highly precise static analyzers.}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/3JL36E6L/1-s2.0-S0950584917302987-main.pdf;/home/jf/snap/zotero-snap/common/Zotero/storage/4M2MB6RS/Li et al. - 2017 - Static analysis of android apps A systematic lite.pdf;/home/jf/snap/zotero-snap/common/Zotero/storage/U77CUK9D/S0950584917302987.html} +} +@article{luoTaintBenchAutomaticRealworld2022, + title = {{{TaintBench}}: {{Automatic}} Real-World Malware Benchmarking of {{Android}} Taint Analyses}, + shorttitle = {{{TaintBench}}}, + author = {Luo, Linghui and Pauck, Felix and Piskachev, Goran and Benz, Manuel and Pashchenko, Ivan and Mory, Martin and Bodden, Eric and Hermann, Ben and Massacci, Fabio}, + date = {2022-01}, + journaltitle = {Empirical Software Engineering}, + shortjournal = {Empir Software Eng}, + volume = {27}, + number = {1}, + pages = {16}, + issn = {1382-3256, 1573-7616}, + doi = {10.1007/s10664-021-10013-5}, + url = {https://link.springer.com/10.1007/s10664-021-10013-5}, + urldate = {2023-02-13}, + abstract = {Abstract Due to the lack of established real-world benchmark suites for static taint analyses of Android applications, evaluations of these analyses are often restricted and hard to compare. Even in evaluations that do use real-world apps, details about the ground truth in those apps are rarely documented, which makes it difficult to compare and reproduce the results. To push Android taint analysis research forward, this paper thus recommends criteria for constructing real-world benchmark suites for this specific domain, and presents TaintBench , the first real-world malware benchmark suite with documented taint flows. TaintBench benchmark apps include taint flows with complex structures, and addresses static challenges that are commonly agreed on by the community. Together with the TaintBench suite, we introduce the TaintBench framework, whose goal is to simplify real-world benchmarking of Android taint analyses. First, a usability test shows that the framework improves experts’ performance and perceived usability when documenting and inspecting taint flows. Second, experiments using TaintBench reveal new insights for the taint analysis tools Amandroid and FlowDroid : (i) They are less effective on real-world malware apps than on synthetic benchmark apps. (ii) Predefined lists of sources and sinks heavily impact the tools’ accuracy. (iii) Surprisingly, up-to-date versions of both tools are less accurate than their predecessors.}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/8UTN2I89/Luo et al. - 2022 - TaintBench Automatic real-world malware benchmark.pdf} +} + +@inproceedings{pauckAndroidTaintAnalysis2018, + title = {Do {{Android}} Taint Analysis Tools Keep Their Promises?}, + booktitle = {Proceedings of the 2018 26th {{ACM Joint Meeting}} on {{European Software Engineering Conference}} and {{Symposium}} on the {{Foundations}} of {{Software Engineering}}}, + author = {Pauck, Felix and Bodden, Eric and Wehrheim, Heike}, + date = {2018-10-26}, + pages = {331--341}, + publisher = {{ACM}}, + location = {{Lake Buena Vista FL USA}}, + doi = {10.1145/3236024.3236029}, + url = {https://dl.acm.org/doi/10.1145/3236024.3236029}, + urldate = {2023-02-13}, + eventtitle = {{{ESEC}}/{{FSE}} '18: 26th {{ACM Joint European Software Engineering Conference}} and {{Symposium}} on the {{Foundations}} of {{Software Engineering}}}, + isbn = {978-1-4503-5573-5}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/DSMG5QEE/3236024.3236029.pdf;/home/jf/snap/zotero-snap/common/Zotero/storage/JVQWJV6Z/Pauck et al. - 2018 - Do Android taint analysis tools keep their promise.pdf} +} +@inproceedings{bosuCollusiveDataLeak2017, + title = {Collusive {{Data Leak}} and {{More}}: {{Large-scale Threat Analysis}} of {{Inter-app Communications}}}, + shorttitle = {Collusive {{Data Leak}} and {{More}}}, + booktitle = {Proceedings of the 2017 {{ACM}} on {{Asia Conference}} on {{Computer}} and {{Communications Security}}}, + author = {Bosu, Amiangshu and Liu, Fang and Yao, Danfeng (Daphne) and Wang, Gang}, + date = {2017-04-02}, + pages = {71--85}, + publisher = {{ACM}}, + location = {{Abu Dhabi United Arab Emirates}}, + doi = {10.1145/3052973.3053004}, + url = {https://dl.acm.org/doi/10.1145/3052973.3053004}, + urldate = {2023-02-13}, + eventtitle = {{{ASIA CCS}} '17: {{ACM Asia Conference}} on {{Computer}} and {{Communications Security}}}, + isbn = {978-1-4503-4944-4}, + langid = {english}, + file = {/home/jf/snap/zotero-snap/common/Zotero/storage/KGRWZUY8/Bosu et al. - 2017 - Collusive Data Leak and More Large-scale Threat A.pdf} +}, + + +@article{desnos:adnroguard:2011, + title={Android: From Reversing to Decompilation}, + author={Desnos, Anthony and Gueguen, Geoffroy}, + journal={Black Hat Abu Dhabi}, + year={2011}, + url={https://media.blackhat.com/bh-ad-11/Desnos/bh-ad-11-DesnosGueguen-Andriod-Reversing_to_Decompilation_WP.pdf}, + +}, + +@article{reaves_droid_2016, + title = {*droid: {Assessment} and {Evaluation} of {Android} {Application} {Analysis} {Tools}}, + volume = {49}, + issn = {0360-0300}, + shorttitle = {*droid}, + url = {https://doi.org/10.1145/2996358}, + doi = {10.1145/2996358}, + abstract = {The security research community has invested significant effort in improving the security of Android applications over the past half decade. This effort has addressed a wide range of problems and resulted in the creation of many tools for application analysis. In this article, we perform the first systematization of Android security research that analyzes applications, characterizing the work published in more than 17 top venues since 2010. We categorize each paper by the types of problems they solve, highlight areas that have received the most attention, and note whether tools were ever publicly released for each effort. Of the released tools, we then evaluate a representative sample to determine how well application developers can apply the results of our community’s efforts to improve their products. We find not only that significant work remains to be done in terms of research coverage but also that the tools suffer from significant issues ranging from lack of maintenance to the inability to produce functional output for applications with known vulnerabilities. We close by offering suggestions on how the community can more successfully move forward.}, + number = {3}, + urldate = {2023-01-10}, + journal = {ACM Computing Surveys}, + author = {Reaves, Bradley and Bowers, Jasmine and Gorski III, Sigmund Albert and Anise, Olabode and Bobhate, Rahul and Cho, Raymond and Das, Hiranava and Hussain, Sharique and Karachiwala, Hamza and Scaife, Nolen and Wright, Byron and Butler, Kevin and Enck, William and Traynor, Patrick}, + month = oct, + year = {2016}, + keywords = {Android, application security, program analysis}, + pages = {55:1--55:30}, + file = {Full Text PDF:/home/histausse/Zotero/storage/8JZFY54J/Reaves et al. - 2016 - droid Assessment and Evaluation of Android Appli.pdf:application/pdf}, +} + + +@inproceedings{mauthe_large-scale_2021, + title = {A {Large}-{Scale} {Empirical} {Study} of {Android} {App} {Decompilation}}, + doi = {10.1109/SANER50967.2021.00044}, + abstract = {Decompilers are indispensable tools in Android malware analysis and app security auditing. Numerous academic works also employ an Android decompiler as the first step in a program analysis pipeline. In such settings, decompilation is frequently regarded as a "solved" problem, in that it is simply expected that source code can be accurately recovered from an app. While a large proportion of methods in an app can typically be decompiled successfully, it is common that at least some methods fail to decompile. In order to better understand the practical applicability of techniques in which decompilation is used as part of an automated analysis, it is important to know the actual expected failure rate of Android decompilation. To this end, we have performed what is, to the best of our knowledge, the first large-scale study of Android decompilation failure rates. We have used three sets of apps, consisting of, respectively, 3,018 open-source apps, 13,601 apps from a recent crawl of Google Play, and a collection of 24,553 malware samples. In addition to the state-of-the-art Dalvik bytecode decompiler jadx, we used three popular Java decompilers. While jadx achieves an impressively low failure rate of only 0.02\% failed methods per app on average, we found that it manages to recover source code for all methods in only 21\% of the Google Play apps.We have also sought to better understand the degree to which in-the-wild obfuscation techniques can prevent decompilation. Our empirical evaluation, complemented with an indepth manual analysis of a number of apps, indicate that code obfuscation is quite rarely encountered, even in malicious apps. Moreover, decompilation failures mostly appear to be caused by technical limitations in decompilers, rather than by deliberate attempts to thwart source-code recovery by obfuscation. This is an encouraging finding, as it indicates that near-perfect Android decompilation is, at least in theory, achievable, with implementation-level improvements to decompilation tools.}, + booktitle = {2021 {IEEE} {International} {Conference} on {Software} {Analysis}, {Evolution} and {Reengineering} ({SANER})}, + author = {Mauthe, Noah and Kargén, Ulf and Shahmehri, Nahid}, + month = mar, + year = {2021}, + note = {ISSN: 1534-5351}, + keywords = {Android, Java, Malware, malware, reverse engineering, mobile apps, obfuscation, Tools, Conferences, decompilation, Manuals, Pipelines, Process control}, + pages = {400--410}, + file = {IEEE Xplore Abstract Record:/home/histausse/Zotero/storage/RWT9CKBF/9425937.html:text/html;Mauthe et al. - 2021 - A Large-Scale Empirical Study of Android App Decom.pdf:/home/histausse/Zotero/storage/I8KKRIJV/Mauthe et al. - 2021 - A Large-Scale Empirical Study of Android App Decom.pdf:application/pdf}, +} + diff --git a/jury.typ b/jury.typ new file mode 100644 index 0000000..eefafd4 --- /dev/null +++ b/jury.typ @@ -0,0 +1,19 @@ +#let jury-content = [ + #text(size: 1.3em)[Composition du jury :] + + #{ + set text(size: .92em) + table( + columns: 4, + column-gutter: 2em, + stroke: 0pt, + inset: (x: 0pt, y: .5em), + "Présidente :", "Alice", "", "", + "Rapporteurs :", "Bob", "", "", + "", "Eve", "", "", + "Examinatrice :", "Mallory", "", "", + "Dir. de thèse :", "Jean-François Lalande", "Professeur des Universités", "CentraleSupélec", + "", "Valérie Viet Triem Tong", "Professeure", "CentraleSupélec", + ) + } +] diff --git a/main.pdf b/main.pdf new file mode 100644 index 0000000..46f6807 Binary files /dev/null and b/main.pdf differ diff --git a/main.typ b/main.typ new file mode 100644 index 0000000..f29917e --- /dev/null +++ b/main.typ @@ -0,0 +1,134 @@ +#import "@local/template-thesis-matisse:0.0.1": * + +#import "jury.typ": jury-content +#import "abstract.typ": keywords-en, keywords-fr, abstract-en, abstract-fr +#import "0_preamble/notations.typ": * + +#show: matisse-thesis.with( + title-fr: todo[Find a title], + title-en: todo[Find a title], + author: "Jean-Marie MINEAU", + affiliation: "IRISA", + defense-place: "Rennes", + defense-date: todo[Date], + jury-content: [#jury-content \ #todo[Compose a Jury]], + university: "CS", + keywords-en: keywords-en, + keywords-fr: keywords-fr, + abstract-en: abstract-en, + abstract-fr: abstract-fr, + draft: true, +) + +// Preamble +#{ + set heading(numbering: none, outlined: false) + set page(numbering: "i") + counter(page).update(0) + + include("0_preamble/acknowledgements.typ") + + // https://ed-matisse.doctorat-bretagne.fr/fr/soutenance-de-these#p-151 + // > Le manuscrit est normalement rédigé en français (Loi relative à l'emploi de la langue française, 1994). + // > Toutefois, il est accepté de bâtir le manuscrit sur la base d'un résumé substantiel en français + // > (au moins 4 pages), le reste du manuscrit étant considéré comme des annexes et étant alors rédigé en + // > langue étrangère. + // > + // > Dans le cas d'une thèse qui ne serait pas rédigée en français, il est conseillé de bien distinguer le + // > résumé substantiel des chapitres de la thèse pour éviter d'essuyer un refus de la part de + // > l'administration de l'établissement d'inscription (par exemple en l'intitulant résumé en français et + // > en ne lui affectant aucun numéro de chapitre). + // + include("0_preamble/french_summary.typ") + + outline(title: "Table of Contents", indent: auto) + show outline.entry: it => { + v(5mm, weak: true) + it + } + + outline(title: "Index of Figures", target: figure.where(supplement: [Figure])) + outline(title: "Index of Tables", target: figure.where(supplement: [Table])) + outline(title: "Index of Listings", target: figure.where(supplement: [Listing])) + + [= List of Acronyms and Notations] + + notation_table + +} + +#counter(page).update(1) + += Introduction + +#todo[Write an introduction] + +#lorem(200) + +#figure( + circle(radius: 50pt), + caption: [A circle], +) + +#lorem(200) + += Background + +#todo[Present your field background] + +#lorem(200) + +#figure( + table( + columns: (20pt, 20pt, 20pt), + align: center+horizon, + table.header( + table.cell(colspan:3)[Play] + ), + emoji.crossmark, [], emoji.circle.stroked, + [], emoji.circle.stroked, [], + emoji.crossmark, [], emoji.crossmark, + ), + caption: [A tic tac toe game], +) + +== Something + +#lorem(200) + +== Something Else + +#lorem(200) + += Related Work + +#todo[Do the State of the Art] + +#lorem(200) + +#figure([ + ```python + for _ in range(10): + print("Hello Void") + ``` +], caption: [Some code], +) + + +#include("3_rasta/rasta.typ") + += Contribution 2 + +#lorem(500) + += Contribution n + +#lorem(500) + += Conclusion + +#todo[Conclude] + +#lorem(500) + +#bibliography("bibliography.bib")