Reusability of Android Static Tools and Analysis. This repository contains the source code for reproducing the experiments of the paper "Evaluating the Re-Usability of Android Static Analysis Tools" published in the conference ICSR 2024.
Find a file
2025-06-17 16:55:02 +02:00
rasta_data_manipulation first commit 2023-11-16 14:30:24 +01:00
rasta_exp add doc to download images instead of building 2024-04-17 14:52:57 +02:00
.gitignore first commit 2023-11-16 14:30:24 +01:00
GPLv3 add license 2024-04-19 11:07:43 +02:00
README.md Update README.md 2025-06-17 16:55:02 +02:00

RASTA

Rasta stands for Reproducibility of Android Static Tools and Analysis.

This repository contains the source code for reproducing the experiments of the paper "Evaluating the Re-Usability of Android Static Analysis Tools" published in the conference ICSR 2024.

The provided source code enables to rebuild Docker and Singularity images for several static analysis tools of the literature, but pre-build images can be retrieved directly from the following locations:

The Docker image provides an interactive container to the user for analyzing an APK file. The Singularity image helps to run batch analysis for a dataset of applications on a Singularity cluster. Additionally, the source code contains scripts for extracting the status of each APK analysis (failed/finished) and some characteristics (time, memory) and pushing these values in a database for further statistics.

The input data and pre-computed output data are provided from outside this repository.

If someone wants to reuse a specific analyzing tool, without installing it and by using our Docker images, have a look at the end of the readme.

Data

Some data are needed to reproduce the experiment (at least, the androzoo indexes we used to sample our dataset). Those data are too heavy to be stored in a git repository, so they need to be downloaded from zenodo to the root of this repository:

curl https://zenodo.org/records/10137905/files/rasta_data_v1.0.tgz?download=1 | tar -xz

Dependencies

To run the Rasta experiment, some tools are required:

  • Docker (e.g. version 24.0.6),
  • Singularity (e.g version 3.11.1)
  • a modern version of Python (e.g. Python 3.10 or 3.11).
  • gzip
  • sqlite3

One way to install those tools is to use Nixpkgs (nix-shell -p docker singularity python310 python310Packages.numpy python310Packages.matplotlib sqlite), another way is to follow the instructions of the different tools (https://docs.sylabs.io/guides/3.11/user-guide/, https://docs.docker.com/).

Warning

(One years later, 2025):

Since Ubuntu 23.10, apparmor prevent the creation of unprivileged namespace by default. This means singularity wont work without a specific apparmor profile (wich is not installed by nix-shell).

Fortunately, Ubuntu now has a package for singularity: singularity-container. Using your distribution package should be the prefered method for installing the tools.

They are also some python dependencies that need to be installed in a virtual env:

python3 -m venv venv
source venv/bin/activate
pip install rasta_data_manipulation/
pip install -r rasta_exp/requirements.txt

From now on, all commands are run from inside this venv.

Re-generating datasets

The datasets we used (Drebin and Rasta, split in 10 balanced sets) are located in data/dataset:

  • Drebin: drebin
  • Rasta: set0, set1, ..., set9

It is possible to reproduce the generation of these datasets, using latest.csv.gz and year_and_sdk.csv.gz that comes from Androzoo. Use the following command regenerate the Rasta dataset:

rasta-gen-dataset data/androzoo/latest.csv.gz data/androzoo/year_and_sdk.csv.gz -o data/dataset

Container Images

The containers are stored in data/imgs. They can be regenerated with:

cd rasta_exp
./build_docker_images.sh ../data/imgs
cd ..

The images can also be directly downloaded from the Zenodo archive using:

cd rasta_exp
./download_sif_images.sh ../data/imgs
cd ..

The container and binary of Perfchecker is not provided as the Perfchecker binary is only available on demand.

Running experiments

The results of the experiments are stored in data/results/archives/. They can be extracted with:

mkdir -p data/results/reports/rasta
mkdir -p data/results/reports/drebin
for archive in $(ls data/results/archives/status_set*.tgz); do tar -xzf ${archive} --directory data/results/reports/rasta; done
tar -xzf data/results/archives/status_drebin.tgz --directory data/results/reports/drebin

They can also be regenerated by recomputing all our experiments. You will need some weeks or months...

To run the experiment using a Singularity image hosted on your own computer, you must simplify the settings.ini file that is intended to run on Singularity cluster. This file is located in the rasta_exp directory. The following 3 lines is sufficient to configure the experiment for running on your local computer:

[AndroZoo]
apikey = <KEY>
base_url = https://androzoo.uni.lu

Do not forget to replace <KEY> by your AndroZoo key.

Then, you can run the experiment for all tools and for the Rasta and Drebin dataset by doing:

./rasta_exp/run_exp_local.sh ./data/imgs ./data/dataset/drebin ./data/results/reports/drebin/status_drebin
for i in {0..9}; do
    ./rasta_exp/run_exp_local.sh ./data/imgs "./data/dataset/set${i}" "./data/results/reports/rasta/status_set${i}"
done;

This takes a lot of times, probably several months. You should adapt this last script to either reduce:

  • the number of static analysis tools to evaluate
  • the dataset size
  • other parameters in the source code such as the timeout

Pushing results into a database

The generated file reports are JSON files that can be parsed after the finishing of the previous experiments. The provided parsing script help to push some information into databases to help further analysis. We provided pre-computed dumps of the database that can be obtained at this stage. The dumps can be obtained by doing:

zcat data/results/drebin.sql.gz | sqlite3 data/results/drebin.db
zcat data/results/rasta.sql.gz | sqlite3 data/results/rasta.db

To re-generate the database from the JSON reports of the previous experiments:

./rasta_data_manipulation/make_db.sh ./data

Generating the database requires an androzoo API key and a lot of times because we download the apks to get there total dex size (the value indicated in latest.csv only take into account the size of classes.dex and not the sum of the size of all dex file when they are more than one).

Database Usage

Most of the results presented in the paper can be regenerated from the database using the following script:

./rasta_data_manipulation/extract_result.sh ./data

They are 4 tables in the database, apk, tool, exec and error that we describe in the following.

Apk table

The data related to the apks of the dataset are in the apk table that has the following columns:

  • sha256: The hash of the apk
  • first_seen_year: The first year the apk has been seen
  • apk_size: The total size of the apk
  • vt_detection: The number of detections by Virus Total
  • min_sdk: The min SDK indicated by the apk
  • max_sdk: The max SDK indicated by the apk
  • target_sdk: The target SDK indicated by the apk
  • apk_size_decile: The decile of size apk the apk belong to
  • dex_date: The date indicated in the dex file
  • pkg_name: The name of the apk
  • vt_scan_date: The year when the apk was provided to Virus Total
  • dex_size: The total size of the dex files
  • added: The year the apk was added to AndrooZoo
  • markets: Where the apk was collected
  • dex_size_decile: The decile of dex size the apk belong to
  • dex_size_decile_by_year: The decile of dex size for the first_seen_year of the apk

Tool table

The data related to the tools used by the experiment are in the tool table. Its columns are:

  • tool_name: The name of the tool
  • use_python: If the tool uses python
  • use_java: If the tool uses java
  • use_scala: If the tool uses scala
  • use_ocaml: If the tool uses ocaml
  • use_ruby: If the tool uses ruby
  • use_prolog: If the tool uses prolog
  • use_soot: If the tool uses soot
  • use_androguard: If the tool uses androguard
  • use_apktool: If the tool uses apktool

Exec table

The data related to the execution of an analysis are in the exec table. Columns are:

  • sha256: The hash of the tested apk
  • tool_name: The name of the tested tool
  • tool_status: The status of the analysis: FAILED, FINISHED, TIMEOUT, OTHER
  • time: The duration of the analysis
  • exit_status: The exit status code return by the execution
  • timeout: If the execution timedout
  • max_rss_mem: The memory used by the analysis

They are other values collected by time during the analysis:

  • avg_rss_mem
  • page_size
  • kernel_cpu_time
  • user_cpu_time
  • nb_major_page_fault
  • nb_minor_page_fault
  • nb_fs_input
  • nb_fs_output
  • nb_socket_msg_received
  • nb_socket_msg_sent
  • nb_signal_delivered

Error table

The error collected during the analysis are stored in the error table. All columns are not used, depending on the error_type.

  • tool_name: The name of the tool that raised the error
  • sha256: The hash of the apk analyzed when the error was raised
  • error_type: The type of error (Log4j, Java, Python, Xsb, Ocaml, Log4jSimpleMsg, Ruby)
  • error: The name of the error
  • msg: The message of the error
  • cause: Rough estimation of the cause of the error
  • first_line: The line number of the first line of the error in the log
  • last_line: The line number of the last line of the error in the log
  • logfile_name: The file in which the error was collected (usually stdout and stderr)
  • file: The file of the ruby script that raised the error
  • line: The line number of the instruction that raised the error
  • function: The function that raised the error
  • level: The level of the log (eg FATAL, CRITICAL)
  • origin: The origin of the error (java class referred by log4j)
  • raised_info: 'Raised at' information (for Ocaml errors)
  • called_info: 'Called from' information (for Ocaml errors)

Database usage

The data can be explored using SQL queries. tool_name and sha256 are the usual foreign keys used for joins. For example, this SQL query gives the average time taken by an analysis made by tools using soot, associated with the average size of bytecode of the applications analyzed, grouped by deciles of this size on the whole dataset:

$ sqlite3 data/results/rasta.db
sqlite> SELECT AVG(dex_size), AVG(time) 
FROM exec 
    INNER JOIN apk ON exec.sha256=apk.sha256 
    INNER JOIN tool ON exec.tool_name=exec.tool_name 
WHERE tool.use_soot = TRUE AND exec.tool_status = 'FAILED' 
GROUP BY dex_size_decile
ORDER BY AVG(dex_size);

Reusing a Specific Tool

If you don't want to use the dockerhub image, you can build them using:

cd rasta_exp
./build_docker_images.sh ../data/imgs
cd ..

The obtained images are named histausse/rasta-<tool-name>:icsr2024, and the environment variables associated are in rasta_exp/envs/<tool-name>_docker.env. The build_docker_images.sh can be edited to chose only one tool to be built.

After building a tool, a container can be entered interactively by doing:

docker run --rm --env-file=rasta_exp/envs/mallodroid_docker.env -v /tmp/mnt:/mnt -it histausse/rasta-mallodroid:icsr2024 bash

Here, /tmp/mnt is mounted to /mnt in the container. Put the apk in /tmp/mnt to analyze it.

To run the analysis of the APK, run /run.sh <apk> where <apk> is the name of the apk in /mnt, without the /mnt prefix. The artifact of the analysis are stored in /mnt, including the stdout, stderr and result of the time command.

root@e3c39c14e382:/# ls /mnt
E29CCE76464767F97DAE039DBA0A0AAE798DF1763AD02C6B4A45DE81762C23DA.apk
root@e3c39c14e382:/# /run.sh E29CCE76464767F97DAE039DBA0A0AAE798DF1763AD02C6B4A45DE81762C23DA.apk
root@e3c39c14e382:/# ls /mnt/
E29CCE76464767F97DAE039DBA0A0AAE798DF1763AD02C6B4A45DE81762C23DA.apk  report  stderr  stdout

The report directory contains the result of the time command. The stdout and stderr contains the trace of execution of the tool on the APK. If extra files are generated by the tool, you should find them in the this directory.

The run.sh script can be customized to modify the run parameters used for this tool. The script that is copied into the Docker image is located at rasta_exp/docker/<tool name>/home_build/run.sh.

Dockerhub images

The docker images are available on dockerhub under the names:

  • histausse/rasta-adagio:icsr2024
  • histausse/rasta-amandroid:icsr2024
  • histausse/rasta-anadroid:icsr2024
  • histausse/rasta-androguard-dad:icsr2024
  • histausse/rasta-androguard:icsr2024
  • histausse/rasta-apparecium:icsr2024
  • histausse/rasta-blueseal:icsr2024
  • histausse/rasta-dialdroid:icsr2024
  • histausse/rasta-didfail:icsr2024
  • histausse/rasta-droidsafe:icsr2024
  • histausse/rasta-flowdroid:icsr2024
  • histausse/rasta-gator:icsr2024
  • histausse/rasta-ic3-fork:icsr2024
  • histausse/rasta-ic3:icsr2024
  • histausse/rasta-iccta:icsr2024
  • histausse/rasta-mallodroid:icsr2024
  • histausse/rasta-redexer:icsr2024
  • histausse/rasta-saaf:icsr2024
  • histausse/rasta-wognsen:icsr2024

LICENSE

This repository is licensed under the GPLv3, please notice that this license do not apply to the tested tools.

Remember, this program is provided "as is" without warranty of any kind.