rasta: wip
All checks were successful
/ test_checkout (push) Successful in 1m13s

This commit is contained in:
Jean-Marie Mineau 2025-08-13 00:44:25 +02:00
parent 5e512b585a
commit 01ce20ffda
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
7 changed files with 81 additions and 59 deletions

View file

@ -1,9 +1,13 @@
#import "../lib.typ": todo, etal, eg
#import "../lib.typ": etal, eg, MWE, HPC, SDK, SDKs, APKs, DEX
#import "../lib.typ": todo, jfl-note
#import "X_var.typ": *
#import "X_lib.typ": *
== Methodology <sec:rasta-methodology>
#todo[small intro: resumé approche + schema?]
#jfl-note[Add diagram: Li etal -> [tool selection] -> drop/ - selected -> [select source version] -> [packaging] -> docker / -> singularity -> [exp]]
=== Collecting Tools
#figure({
@ -63,30 +67,29 @@
)
[
*binaries, sources*: #nr: not relevant, #ok: available, #bad: partially available, #ko: not provided\
*documentation*: #okk: excellent, MWE, #ok: few inconsistencies, #bad: bad quality, #ko: not available\
*documentation*: #okk: excellent, #MWE, #ok: few inconsistencies, #bad: bad quality, #ko: not available\
*decision*: #ok: considered; #bad: considered but not built; #ko: out of scope of the study
]},
caption: [Considered tools~@Li2017: availability and usage reliability],
) <tab:rasta-tools>
We collected the static analysis tools from~@Li2017, plus one additional paper encountered during our review of the state-of-the-art (DidFail~@klieberAndroidTaintFlow2014).
They are listed in @tab:rasta-tools, with the original release date and associated paper.
They are listed in @tab:rasta-tools, with the original release date and associated publication.
We intentionally limited the collected tools to the ones selected by Li #etal~@Li2017 for several reasons.
First, not using recent tools enables to have a gap of at least 5 years between the publication and the more recent APK files, which enables to measure the reusability of previous contribution with a reasonable gap of time.
Second, collecting new tools would require to describe these tools in depth, similarly to what have been performed by Li #etal~@Li2017, which is not the primary goal of this paper.
Second, collecting new tools would require to inspect these tools in depth, similarly to what have been performed by Li #etal~@Li2017, which is not the primary goal of this chapter.
Additionally, selection criteria such as the publication venue or number of citations would be necessary to select a subset of tools, which would require an additional methodology.
These possible contributions are left for future work.
Some tools use hybrid analysis (both static and dynamic): A3E~@DBLPconfoopslaAzimN13, A5~@vidasA5AutomatedAnalysis2014, Android-app-analysis~@geneiatakisPermissionVerificationApproach2015, StaDynA~@zhauniarovichStaDynAAddressingProblem2015.
They have been excluded from this paper.
They have been excluded from this study.
We manually searched the tool repository when the website mentioned in the paper is no longer available (#eg when the repository have been migrated from Google code to GitHub) and for each tool we searched for:
- an optional binary version of the tool that would be usable as a fall back (if the sources cannot be compiled for any reason);
- the source code of the tool;
- the documentation for building and using the tool with a MWE (Minimum Working Example).
- an optional binary version of the tool that would be usable as a fall back (if the sources cannot be compiled for any reason).
- the source code of the tool.
- the documentation for building and using the tool with a #MWE.
In @tab:rasta-tools we rated the quality of these artifacts with "#ok" when available but may have inconsistencies, a "#bad" when too much inconsistencies (inaccurate remarks about the sources, dead links or missing parts) have been found, a "#ko" when no documentation have been found, and a double "#okk" for the documentation when it covers all our expectations (building process, usage, MWE).
Results show that documentation is often missing or very poor (#eg Lotrack), which makes the rebuild process very complex and the first analysis of a MWE.
In @tab:rasta-tools we rated the quality of these artifacts with "#ok" when available but may have inconsistencies, a "#bad" when too much inconsistencies (inaccurate remarks about the sources, dead links or missing parts) have been found, a "#ko" when no documentation have been found, and a double "#okk" for the documentation when it covers all our expectations (building process, usage, #MWE).
Results show that documentation is often missing or very poor (#eg Lotrack), which makes the rebuild process very complex and the first analysis of a #MWE.
We finally excluded Choi #etal~@CHOI2014620 as their tool works on the sources of Android applications, and Poeplau #etal~@DBLPconfndssPoeplauFBKV14 that focus on Android hardening.
@ -177,7 +180,7 @@ For example, a fork of Aparecium contains a port for Windows 7 which does not su
For IC3, the fork seems promising: it has been updated to be usable on a recent operating system (Ubuntu 22.04 instead of Ubuntu 12.04 for the original version) and is used as a dependency by IccTa.
We decided to keep these two versions of the tool (IC3 and IC3_fork) to compare their results.
Then, we self-allocated a maximum of four days for each tool to successfully read and follow the documentation, compile the tool and obtain the expected result when executing an analysis of a MWE.
Then, we self-allocated a maximum of four days for each tool to successfully read and follow the documentation, compile the tool and obtain the expected result when executing an analysis of a #MWE.
We sent an email to the authors of each tool to confirm that we used the more suitable version of the code, that the command line we used to analyze an application is the most suitable one and, in some cases, requested some help to solve issues in the building process.
We reported in @tab:rasta-sources the authors that answered our request and confirmed our decisions.
@ -189,32 +192,39 @@ Thus, if the documentation mentions a specific operating system, we use a Docker
// Those libraries are only available on Ubuntu 12 or previous versions.
//
Most of the time, tools require additional external components to be fully functional.
It could be resources such as the android.jar file for each version of the SDK, a database, additional libraries or tools.
It could be resources such as the `android.jar` file for each version of the #SDK, a database, additional libraries or tools.
Depending of the quality of the documentation, setting up those components can take hours to days.
This is why we automatized in a Dockerfile the setup of the environment in which the tool is built and run#footnote[To guarantee reproducibility we published the results, datasets, Dockerfiles and containers: https://github.com/histausse/rasta, https://zenodo.org/records/10144014, https://zenodo.org/records/10980349 and on Docker Hub as `histausse/rasta-<toolname>:icsr2024`]
This is why we automatized in a Dockerfile the setup of the environment in which the tool is built and run#footnote[
#set list(indent: 1em) // avoid having the bullet align with the footnot numbering
To guarantee reproducibility we published the results, datasets, Dockerfiles and containers:
- https://github.com/histausse/rasta .
- https://zenodo.org/records/10144014 .
- https://zenodo.org/records/10980349 .
- on Docker Hub as `histausse/rasta-<toolname>:icsr2024`.
]
=== Runtime Conditions
#figure(
image(
"figs/running.svg",
width: 80%,
width: 100%,
alt: "A diagram representing the methodology. The word 'Tool' is linked to a box labeled 'Docker image' by an arrow labeled 'building'. The box 'Docker image' is linked to a box labeled 'Singularity image' by an arrow labeled 'conversion'. The box 'Singularity image' is linked to a box labeled 'Execution monitoring' by a dotted arrow labeled 'Manuel tests' and to an image of a server labeled 'Singularity cluster' by an arrow labeled deployment. An image of three android logo labeled 'apks' is also linked to the 'Singularity cluster' by an arrow labeled 'running the tool analysis'. The 'Singularity cluster' image is linked to the 'Execution monitoring' box by an arrow labeled 'log capture'. The 'Execution monitoring' box linked to the words 'Exit status' by an unlabeled arrow.",
),
caption: [Methodology overview],
) <fig:rasta-overview>
As shown in @fig:rasta-overview, before benchmarking the tools, we built and installed them in a Docker containers for facilitating any reuse of other researchers.
We converted them into Singularity containers because we had access to such a cluster and because this technology is often used by the HPC community for ensuring the reproducibility of experiments.
We converted them into Singularity containers because we had access to such a cluster and because this technology is often used by the #HPC community for ensuring the reproducibility of experiments.
//The Docker container allows a user to interact more freely with the bundled tools.
//Then, we converted this image to a Singularity image.
We performed manual tests using these Singularity images to check:
- the location where the tool is writing on the disk. For the best performances, we expect the tools to write on a mount point backed by an SSD. Some tools may write data at unexpected locations which required small patches from us.
- the amount of memory allocated to the tool. We checked that the tool could run a MWE with a #ramlimit limit of RAM.
- the network connection opened by the tool, if any. We expect the tool not to perform any network operation such as the download of Android SDKs. Thus, we prepared the required files and cached them in the images during the building phase. In a few cases, we patched the tool to disable the download of resources.
- the amount of memory allocated to the tool. We checked that the tool could run a #MWE with a #ramlimit limit of RAM.
- the network connection opened by the tool, if any. We expect the tool not to perform any network operation such as the download of Android #SDKs. Thus, we prepared the required files and cached them in the images during the building phase. In a few cases, we patched the tool to disable the download of resources.
A campaign of tests consists in executing the #nbtoolsvariationsrun selected tools on all APKs of a dataset.
A campaign of tests consists in executing the #nbtoolsvariationsrun selected tools on all #APKs of a dataset.
The constraints applied on the clusters are:
- No network connection is authorized in order to limit any execution of malicious software.
@ -268,6 +278,6 @@ If no antivirus has reported the application as malicious, we consider it as a g
Applications in between are dropped.
For computing the release date of an application, we contacted the authors of Androzoo to compute the minimum date between the submission to Androzoo and the first upload to VirusTotal.
Such a computation is more reliable than using the DEX date that is often obfuscated when packaging the application.
Such a computation is more reliable than using the #DEX date that is often obfuscated when packaging the application.
// #todo[Transition] // plus de place :-(