first commit
This commit is contained in:
commit
cd1e91bb99
287 changed files with 86425 additions and 0 deletions
1
.gitignore
vendored
Normal file
1
.gitignore
vendored
Normal file
|
@ -0,0 +1 @@
|
|||
data
|
243
README.md
Normal file
243
README.md
Normal file
|
@ -0,0 +1,243 @@
|
|||
# RASTA
|
||||
|
||||
Reproducibility of the Rasta experiment.
|
||||
|
||||
## Data
|
||||
|
||||
Some data are needed to reproduce the experiment (at the very least, the androzoo indexes we used to sample our dataset). Those data are too heavy to be stored in a git, so they need to be download from zenodo to the root of this repository:
|
||||
|
||||
```
|
||||
curl https://zenodo.org/records/10137905/files/rasta_data_v1.0.tgz?download=1 | tar -xz
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
To run the Rasta experiment, some tools are required:
|
||||
- Docker (e.g. version 24.0.6),
|
||||
- Singularity (e.g version 3.11.1)
|
||||
- a modern version of Python (e.g. Python 3.10 or 3.11).
|
||||
- gzip
|
||||
- sqlite3
|
||||
|
||||
One way to install those tools is to use Nixpkgs (`nix-shell -p docker singularity python310 python310Packages.numpy python310Packages.matplotlib sqlite3`), another way is to follow the instructions of the different tools (<https://docs.sylabs.io/guides/3.11/user-guide/>, <https://docs.docker.com/>).
|
||||
|
||||
They are also some python dependencies that need to be installed in a virtual env:
|
||||
|
||||
```
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install rasta_data_manipulation/
|
||||
pip install -r rasta_exp/requirements.txt
|
||||
```
|
||||
|
||||
From now end, all commands are run from inside the venv.
|
||||
|
||||
## Dataset
|
||||
|
||||
The datasets we used (Drebin and Rasta, split in 10 balanced sets) are in `data/dataset`.
|
||||
|
||||
To reproduce the generation of the dataset, `latest.csv.gz` and `year_and_sdk.csv.gz` are required: `rasta-gen-dataset data/androzoo/latest.csv.gz data/androzoo/year_and_sdk.csv.gz -o data/dataset` (this will no generate the drebin dataset)
|
||||
|
||||
## Container Images
|
||||
|
||||
The containers are stored in `data/imgs`. They can be regenerated with
|
||||
|
||||
```
|
||||
cd rasta_exp
|
||||
./build_docker_images.sh ../data/imgs
|
||||
cd ..
|
||||
```
|
||||
|
||||
(The container images will be released with the final release)
|
||||
|
||||
The container and binary of Perfchecker is not provided as this tool is only available on demand.
|
||||
|
||||
## Experiment
|
||||
|
||||
The results of the experiment are stored in `data/results/archives/`. They can be extracted with:
|
||||
|
||||
```
|
||||
mkdir -p data/results/reports/rasta
|
||||
mkdir -p data/results/reports/drebin
|
||||
for archive in $(ls data/results/archives/status_set*.tgz); do tar -xzf ${archive} --directory data/results/reports/rasta; done
|
||||
tar -xzf data/results/archives/status_drebin.tgz --directory data/results/reports/drebin
|
||||
```
|
||||
|
||||
They can also be regenerated by running the experiment.
|
||||
|
||||
To run the experiment local, first you must set the `settings.ini` file in `rasta_exp`. Replacing it by this is enough (don't forget to replace `<KEY>` by your AndroZoo key):
|
||||
|
||||
```
|
||||
[AndroZoo]
|
||||
apikey = <KEY>
|
||||
base_url = https://androzoo.uni.lu
|
||||
```
|
||||
|
||||
Then, you can run the experiment with:
|
||||
|
||||
```
|
||||
./rasta_exp/run_exp_local.sh ./data/imgs ./data/dataset/drebin ./data/results/reports/drebin/status_drebin
|
||||
for i in {0..9}; do
|
||||
./rasta_exp/run_exp_local.sh ./data/imgs "./data/dataset/set${i}" "./data/results/reports/rasta/status_set${i}"
|
||||
done;
|
||||
```
|
||||
|
||||
(This takes a lot of times)
|
||||
|
||||
## Database
|
||||
|
||||
The reports are parsed into databases to help analyzing them. The database can be extracted from their dumps or generated from the reports and dataset.
|
||||
|
||||
To extract the dumps:
|
||||
|
||||
```
|
||||
zcat data/results/drebin.sql.gz | sqlite3 data/results/drebin.db
|
||||
zcat data/results/rasta.sql.gz | sqlite3 data/results/rasta.db
|
||||
```
|
||||
|
||||
To generate the databases:
|
||||
|
||||
```
|
||||
./rasta_data_manipulation/make_db.sh ./data
|
||||
```
|
||||
|
||||
Generating the database require an androzoo API key and a lot of times because we download the apks to get there total dex size (the value indicated in latest.csv only take into account the size of `classes.dex` and not the sum of the size of all dex file when they are more than one).
|
||||
|
||||
## Database Usage
|
||||
|
||||
|
||||
Most of the results used in the paper can be extracted with:
|
||||
|
||||
```
|
||||
./rasta_data_manipulation/extract_result.sh ./data
|
||||
```
|
||||
|
||||
They are 4 tables in the database, `apk`, `tool`, `exec` and `error`.
|
||||
|
||||
### Apk table
|
||||
|
||||
The data related to the apks of the dataset are in the `apk` table.
|
||||
|
||||
The entry of the `apk` table have the columns:
|
||||
|
||||
- `sha256`: The hash of the apk
|
||||
- `first_seen_year`: The first year the apk has been seen
|
||||
- `apk_size`: The total size of the apk
|
||||
- `vt_detection`: The number of detections by Virus Total
|
||||
- `min_sdk`: The min SDK indicated by the apk
|
||||
- `max_sdk`: The max SDK indicated by the apk
|
||||
- `target_sdk`: The target SDK indicated by the apk
|
||||
- `apk_size_decile`: The decile of size apk the apk belong to
|
||||
- `dex_date`: The date indicated in the dex file
|
||||
- `pkg_name`: The name of the apk
|
||||
- `vt_scan_date`: The year when the apk was provided to Virus Total
|
||||
- `dex_size`: The total size of the dex files
|
||||
- `added`: The year the apk was added to AndrooZoo
|
||||
- `markets`: Where the apk was collected
|
||||
- `dex_size_decile`: The decile of dex size the apk belong to
|
||||
- `dex_size_decile_by_year`: The decile of dex size for the first_seen_year of the apk
|
||||
|
||||
### Tool table
|
||||
|
||||
The data related to the tools used by the experiment are in the `tool` table.
|
||||
|
||||
Its columns are:
|
||||
|
||||
- `tool_name`: The name of the tool
|
||||
- `use_python`: If the tool uses python
|
||||
- `use_java`: If the tool uses java
|
||||
- `use_scala`: If the tool uses scala
|
||||
- `use_ocaml`: If the tool uses ocaml
|
||||
- `use_ruby`: If the tool uses ruby
|
||||
- `use_prolog`: If the tool uses prolog
|
||||
- `use_soot`: If the tool uses soot
|
||||
- `use_androguard`: If the tool uses androguard
|
||||
- `use_apktool`: If the tool uses apktool
|
||||
|
||||
### Exec table
|
||||
|
||||
The data related to the execution of an analysis are in the `exec` table.
|
||||
|
||||
- `sha256`: The hash of the tested apk
|
||||
- `tool_name`: The name of the tested tool
|
||||
- `tool_status`: The status of the analysis: FAILED, FINISHED, TIMEOUT, OTHER
|
||||
- `time`: The duration of the analysis
|
||||
- `exit_status`: The exit status code return by the execution
|
||||
- `timeout`: If the execution timedout
|
||||
- `max_rss_mem`: The memory used by the analysis
|
||||
|
||||
They are other values collected by `time` during the analysis:
|
||||
- `avg_rss_mem`
|
||||
- `page_size`
|
||||
- `kernel_cpu_time`
|
||||
- `user_cpu_time`
|
||||
- `nb_major_page_fault`
|
||||
- `nb_minor_page_fault`
|
||||
- `nb_fs_input`
|
||||
- `nb_fs_output`
|
||||
- `nb_socket_msg_received`
|
||||
- `nb_socket_msg_sent`
|
||||
- `nb_signal_delivered`
|
||||
|
||||
### Error table
|
||||
|
||||
The error collected during the analysis are stored in the `error` table.
|
||||
All columns are not used, depending on the `error_type`.
|
||||
|
||||
- `tool_name`: The name of the tool that raised the error
|
||||
- `sha256`: The hash of the apk analyzed when the error was raised
|
||||
- `error_type`: The type of error (Log4j, Java, Python, Xsb, Ocaml, Log4jSimpleMsg, Ruby)
|
||||
- `error`: The name of the error
|
||||
- `msg`: The message of the error
|
||||
- `cause`: Rough estimation of the cause of the error
|
||||
- `first_line`: The line number of the first line of the error in the log
|
||||
- `last_line`: The line number of the last line of the error in the log
|
||||
- `logfile_name`: The file in which the error was collected (usually stdout and stderr)
|
||||
- `file`: The file of the ruby script that raised the error
|
||||
- `line`: The line number of the instruction that raised the error
|
||||
- `function`: The function that raised the error
|
||||
- `level`: The level of the log (eg FATAL, CRITICAL)
|
||||
- `origin`: The origin of the error (java class referred by log4j)
|
||||
- `raised_info`: 'Raised at' information (for Ocaml errors)
|
||||
- `called_info`: 'Called from' information (for Ocaml errors)
|
||||
|
||||
### Usage
|
||||
|
||||
The data can be explored using SQL queries. `tool_name` and `sha256` are the usual foreign keys used for joins.
|
||||
|
||||
#### Exemple:
|
||||
|
||||
This SQL query gives the average time taken by an analysis made by tool using soot, associated with the average size of bytecode of the applications analysed, grouped by deciles of this size on the whole dataset:
|
||||
|
||||
```
|
||||
$ sqlite3 data/results/rasta.db
|
||||
sqlite> SELECT AVG(dex_size), AVG(time)
|
||||
FROM exec
|
||||
INNER JOIN apk ON exec.sha256=apk.sha256
|
||||
INNER JOIN tool ON exec.tool_name=exec.tool_name
|
||||
WHERE tool.use_soot = TRUE AND exec.tool_status = 'FAILED'
|
||||
GROUP BY dex_size_decile
|
||||
ORDER BY AVG(dex_size);
|
||||
```
|
||||
|
||||
## Reusing a Specific Tool
|
||||
|
||||
The containers are not on docker hub yet, so they need to be built using `build_docker_images.sh`. The images are named `rasta-<tool-name>`, and the environment variables associated are in `rasta_exp/envs/<tool-name>_docker.env`.
|
||||
|
||||
To enter a container, run:
|
||||
|
||||
```
|
||||
docker run --rm --env-file=rasta_exp/envs/mallodroid_docker.env -v /tmp/mnt:/mnt -it rasta-mallodroid bash
|
||||
```
|
||||
|
||||
Here, `/tmp/mnt` is mounted to `/mnt` in the container. Put the `apk` to analyze in it.
|
||||
|
||||
To run the analysis, run `/run.sh <apk>` where `<apk>` is the name of the apk in `/mnt`, without the `/mnt` prefix. The artifact of the analysis are stored in `/mnt`, including the `stdout`, `stderr` and result of the `time` command.
|
||||
|
||||
```
|
||||
root@e3c39c14e382:/# ls /mnt
|
||||
E29CCE76464767F97DAE039DBA0A0AAE798DF1763AD02C6B4A45DE81762C23DA.apk
|
||||
root@e3c39c14e382:/# /run.sh E29CCE76464767F97DAE039DBA0A0AAE798DF1763AD02C6B4A45DE81762C23DA.apk
|
||||
root@e3c39c14e382:/# ls /mnt/
|
||||
E29CCE76464767F97DAE039DBA0A0AAE798DF1763AD02C6B4A45DE81762C23DA.apk report stderr stdout
|
||||
```
|
1
rasta_data_manipulation/.gitattributes
vendored
Normal file
1
rasta_data_manipulation/.gitattributes
vendored
Normal file
|
@ -0,0 +1 @@
|
|||
data.db filter=lfs diff=lfs merge=lfs -text
|
40
rasta_data_manipulation/.gitignore
vendored
Normal file
40
rasta_data_manipulation/.gitignore
vendored
Normal file
|
@ -0,0 +1,40 @@
|
|||
# Byte-compiled / optimized / DLL files
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
|
||||
# C extensions
|
||||
*.so
|
||||
|
||||
# Distribution / packaging
|
||||
.Python
|
||||
env/
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# virtualenv
|
||||
.venv/
|
||||
venv/
|
||||
|
||||
# mypy
|
||||
.mypy_cache/
|
||||
|
||||
*.db
|
||||
year_and_sdk.csv.gz
|
||||
latest_with-added-date.csv.gz
|
||||
figs_drebin/
|
||||
figs_rasta/
|
||||
figs
|
35
rasta_data_manipulation/README.md
Normal file
35
rasta_data_manipulation/README.md
Normal file
|
@ -0,0 +1,35 @@
|
|||
# Rasta Triturage
|
||||
|
||||
Triturage de donnée for the Rasta Project
|
||||
|
||||
## Usage
|
||||
|
||||
This project is managed by poetry (trying new things :-) ). To use it without poetry, you can install it as a python package in a venv:
|
||||
|
||||
```
|
||||
git clone git@gitlab.inria.fr:jmineau/rasta_triturage.git
|
||||
cd rasta_triturage
|
||||
python -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install . -e
|
||||
```
|
||||
|
||||
The reports and information about the apk are in the prepopulated database `data.db` (TODO: add script to populate the DB)
|
||||
|
||||
To generate all the figures in the file `figures`:
|
||||
|
||||
```
|
||||
rasta-triturage -d data.db -f figures
|
||||
```
|
||||
|
||||
To display all the figures:
|
||||
|
||||
```
|
||||
rasta-triturage -d data.db --display
|
||||
```
|
||||
|
||||
The option `-t` allow to specify the tools to compare.
|
||||
|
||||
## Author
|
||||
|
||||
- annon
|
3
rasta_data_manipulation/TODO.md
Normal file
3
rasta_data_manipulation/TODO.md
Normal file
|
@ -0,0 +1,3 @@
|
|||
- Regretion en nuage de point: mem by ceilling log feels bad
|
||||
- IC3: vein diagram
|
||||
- time / mem for specific category
|
25
rasta_data_manipulation/extract_result.sh
Executable file
25
rasta_data_manipulation/extract_result.sh
Executable file
|
@ -0,0 +1,25 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
DATA_DIR=$1
|
||||
if [[ -z "${DATA_DIR}" ]]; then
|
||||
echo 'MISSING DATA_DIR parameter'
|
||||
echo 'usage: ./extract_result.sh DATA_DIR'
|
||||
exit 1
|
||||
fi
|
||||
DATA_DIR="$(readlink -f "$DATA_DIR")"
|
||||
|
||||
DB="${DATA_DIR}/results/rasta.db"
|
||||
DB_DREBIN="${DATA_DIR}/results/drebin.db"
|
||||
FOLDER="figs"
|
||||
|
||||
rasta-status -d ${DB} -f ${FOLDER} --title "Exit status for the Rasta dataset"
|
||||
rasta-status -d ${DB_DREBIN} -f ${FOLDER} --title "Exit status for the Drebin dataset"
|
||||
rasta-success-year -d ${DB} -f "${FOLDER}/by_year"
|
||||
|
||||
rasta-common-errors -d ${DB} -f "${FOLDER}/common_err" -s FAILED
|
||||
rasta-avg-nb-errors -d ${DB} -f "${FOLDER}/common_err"
|
||||
rasta-error-repartition -d ${DB} -f "${FOLDER}"
|
||||
rasta-avg-ressource -d ${DB} -f "${FOLDER}"
|
||||
|
||||
rasta-decorelate-factor -d ${DB} -f "${FOLDER}/decorelation" --decile 8
|
||||
rasta-decorelate-factor -d ${DB} -f "${FOLDER}/decorelation" --decile 6
|
25
rasta_data_manipulation/find_apks_by_tool_error.sh
Executable file
25
rasta_data_manipulation/find_apks_by_tool_error.sh
Executable file
|
@ -0,0 +1,25 @@
|
|||
#!/usr/bin/env sh
|
||||
|
||||
PWD=$(pwd)
|
||||
TOOL=${1}
|
||||
ERROR=${2}
|
||||
DATABASE=${3:-'rasta.db'}
|
||||
REPORT_FOLDER=${4:-"$PWD/../data/reports/rasta"}
|
||||
|
||||
USAGE=$(cat <<- EOM
|
||||
usage: ${0} <tool> <error> [<database> [<repport folder>]]
|
||||
EOM
|
||||
)
|
||||
|
||||
if [[ -z "$TOOL" ]] || [[ -z "$ERROR" ]] || [[ -z "$DATABASE" ]] || [[ -z "$REPORT_FOLDER" ]] ; then
|
||||
echo ${USAGE}
|
||||
exit 1
|
||||
fi
|
||||
|
||||
TMP_FILE=$(mktemp)
|
||||
sqlite3 ${DATABASE} "SELECT DISTINCT error.sha256 || '_-_' || error.tool_name FROM error INNER JOIN exec ON error.tool_name = exec.tool_name AND error.sha256 = exec.sha256 WHERE exec.tool_status = 'FAILED' AND error.tool_name = '$TOOL' and error = '$ERROR';" > ${TMP_FILE}
|
||||
|
||||
find ${REPPORT_FOLDER} | grep -F -f ${TMP_FILE}
|
||||
rm ${TMP_FILE}
|
||||
|
||||
|
35
rasta_data_manipulation/make_db.sh
Executable file
35
rasta_data_manipulation/make_db.sh
Executable file
|
@ -0,0 +1,35 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
DATA_DIR=$1
|
||||
if [[ -z "${DATA_DIR}" ]]; then
|
||||
echo 'MISSING DATA_DIR parameter'
|
||||
echo 'usage: ./make_db.sh DATA_DIR'
|
||||
exit 1
|
||||
fi
|
||||
DATA_DIR="$(readlink -f "$DATA_DIR")"
|
||||
|
||||
|
||||
all_rasta_apk=$(mktemp)
|
||||
cat ${DATA_DIR}/dataset/set* > ${all_rasta_apk}
|
||||
rasta-populate-db-apk -a ${all_rasta_apk} \
|
||||
-d "${DATA_DIR}/results/rasta.db" \
|
||||
--year-and-sdk "${DATA_DIR}/androzoo/year_and_sdk.csv.gz" \
|
||||
--latest-with-added-date "${DATA_DIR}/androzoo/latest_with-added-date.csv.gz" \
|
||||
--fix-dex-file
|
||||
rasta-populate-db-tool -d "${DATA_DIR}/results/rasta.db"
|
||||
report_folders="status_set0 status_set1 status_set2 status_set3 status_set4 status_set5 status_set6 status_set7 status_set8 status_set9"
|
||||
for folder in ${report_folders}; do
|
||||
rasta-populate-db-report -d "${DATA_DIR}/results/rasta.db" -r "${DATA_DIR}/results/reports/rasta/${folder}"
|
||||
done
|
||||
rasta-populate-db-report -d "${DATA_DIR}/results/rasta.db" --estimate-cause
|
||||
|
||||
rasta-populate-db-apk -a "${DATA_DIR}/dataset/drebin" \
|
||||
-d "${DATA_DIR}/results/drebin.db" \
|
||||
--year-and-sdk "${DATA_DIR}/androzoo/year_and_sdk.csv.gz" \
|
||||
--latest-with-added-date "${DATA_DIR}/androzoo/latest_with-added-date.csv.gz" \
|
||||
--fix-dex-file
|
||||
rasta-populate-db-tool -d "${DATA_DIR}/results/drebin.db"
|
||||
rasta-populate-db-report -d "${DATA_DIR}/results/drebin.db" -r "${DATA_DIR}/results/reports/drebin/status_drebin"
|
||||
rasta-populate-db-report -d "${DATA_DIR}/results/drebin.db" --estimate-cause
|
||||
|
||||
rm ${all_rasta_apk}
|
6
rasta_data_manipulation/means_size.sql
Normal file
6
rasta_data_manipulation/means_size.sql
Normal file
|
@ -0,0 +1,6 @@
|
|||
SELECT AVG(dex_size) FROM apk;
|
||||
SELECT AVG(dex_size) FROM apk WHERE vt_detection = 0;
|
||||
SELECT AVG(dex_size) FROM apk WHERE vt_detection != 0;
|
||||
SELECT AVG(apk_size) FROM apk;
|
||||
SELECT AVG(apk_size) FROM apk WHERE vt_detection = 0;
|
||||
SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
|
6
rasta_data_manipulation/means_success_by_year.sql
Normal file
6
rasta_data_manipulation/means_success_by_year.sql
Normal file
|
@ -0,0 +1,6 @@
|
|||
SELECT apk1.first_seen_year, (COUNT(*) * 100) / (SELECT 20 * COUNT(*)
|
||||
FROM apk AS apk2 WHERE apk2.first_seen_year = apk1.first_seen_year
|
||||
)
|
||||
FROM exec JOIN apk AS apk1 ON exec.sha256 = apk1.sha256
|
||||
WHERE exec.tool_status = 'FINISHED' OR exec.tool_status = 'UNKNOWN'
|
||||
GROUP BY apk1.first_seen_year ORDER BY apk1.first_seen_year;
|
2
rasta_data_manipulation/mypy.ini
Normal file
2
rasta_data_manipulation/mypy.ini
Normal file
|
@ -0,0 +1,2 @@
|
|||
[mypy]
|
||||
python_executable = .venv/bin/python
|
1419
rasta_data_manipulation/poetry.lock
generated
Normal file
1419
rasta_data_manipulation/poetry.lock
generated
Normal file
File diff suppressed because it is too large
Load diff
64
rasta_data_manipulation/pyproject.toml
Normal file
64
rasta_data_manipulation/pyproject.toml
Normal file
|
@ -0,0 +1,64 @@
|
|||
[tool.poetry]
|
||||
name = "rasta_triturage"
|
||||
version = "0.2.0"
|
||||
description = "'Triturage de donnée' for the Rasta Project"
|
||||
authors = ["anon"]
|
||||
readme = "README.md"
|
||||
#homepage = ""
|
||||
#repository = ""
|
||||
license = "Proprietary"
|
||||
|
||||
[tool.poetry.urls]
|
||||
#"Bug Tracker" = ""
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = "^3.10"
|
||||
matplotlib = "^3.7.1"
|
||||
pyqt5 = "^5.15.9"
|
||||
numpy = "^1.24.3"
|
||||
|
||||
seaborn = "^0.12.2"
|
||||
python-slugify = "^8.0.1"
|
||||
androguard = "^3.3.5"
|
||||
requests = "^2.31.0"
|
||||
matplotlib-venn = "^0.11.9"
|
||||
python-dateutil = "^2.8.2"
|
||||
|
||||
[tool.poetry.scripts]
|
||||
rasta-triturage = "rasta_triturage.cli:main"
|
||||
rasta-status = "rasta_triturage.cli:show_status_by_tool"
|
||||
rasta-collect-apk-info = "rasta_triturage.cli:get_apk_info"
|
||||
rasta-success-target-sdk = "rasta_triturage.cli:show_success_rate_by_target_sdk"
|
||||
rasta-success-min-sdk = "rasta_triturage.cli:show_success_rate_by_min_sdk"
|
||||
rasta-success-year = "rasta_triturage.cli:show_success_rate_by_first_seen_year"
|
||||
rasta-success-size = "rasta_triturage.cli:show_success_rate_by_dex_size"
|
||||
rasta-success-apk-size = "rasta_triturage.cli:show_success_rate_by_size_decile"
|
||||
rasta-timeout-target-sdk = "rasta_triturage.cli:show_timeout_rate_by_target_sdk"
|
||||
rasta-timeout-min-sdk = "rasta_triturage.cli:show_timeout_rate_by_min_sdk"
|
||||
rasta-timeout-year = "rasta_triturage.cli:show_timeout_rate_by_estimated_year"
|
||||
rasta-populate-db-apk = "rasta_triturage.cli:populate_db_apk"
|
||||
rasta-populate-db-report = "rasta_triturage.cli:populate_db_exec"
|
||||
rasta-populate-db-tool = "rasta_triturage.cli:populate_db_tool"
|
||||
rasta-common-errors = "rasta_triturage.cli:show_common_errors"
|
||||
rasta-avg-nb-errors = "rasta_triturage.cli:average_nb_errors"
|
||||
rasta-error-causes-radar = "rasta_triturage.cli:show_error_cause_radar"
|
||||
rasta-error-repartition = "rasta_triturage.cli:show_error_type_repartition"
|
||||
rasta-avg-occ-by-exec = "rasta_triturage.cli:show_error_avg_occ_by_exec"
|
||||
rasta-ic3-analysis = "rasta_triturage.cli:ic3"
|
||||
rasta-avg-ressource = "rasta_triturage.cli:get_avg_ressource_consumption"
|
||||
rasta-decorelate-factor = "rasta_triturage.cli:plot_decorelated_factor"
|
||||
rasta-count-error-stacks = "rasta_triturage.cli:count_error_stacks"
|
||||
rasta-gen-dataset = "rasta_triturage.cli:generate_dataset"
|
||||
rasta-size-malware = "rasta_triturage.cli:size_malware"
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
pytest = "*"
|
||||
pytest-cov = "*"
|
||||
types-requests = "^2.31.0.0"
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
addopts = "--cov"
|
||||
|
||||
[build-system]
|
||||
requires = ["poetry-core"]
|
||||
build-backend = "poetry.core.masonry.api"
|
3
rasta_data_manipulation/rasta_triturage/__init__.py
Normal file
3
rasta_data_manipulation/rasta_triturage/__init__.py
Normal file
|
@ -0,0 +1,3 @@
|
|||
__author__ = "annon"
|
||||
__email__ = "annon"
|
||||
__version__ = "0.2.0"
|
115
rasta_data_manipulation/rasta_triturage/apk.py
Normal file
115
rasta_data_manipulation/rasta_triturage/apk.py
Normal file
|
@ -0,0 +1,115 @@
|
|||
"""
|
||||
Collect data about apks.
|
||||
"""
|
||||
|
||||
import dateutil.parser as dp # type: ignore
|
||||
import datetime
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt # type: ignore
|
||||
|
||||
from typing import Any, IO, Callable
|
||||
from pathlib import Path
|
||||
|
||||
from .utils import render
|
||||
|
||||
|
||||
def plot_apk_info_by_generic_x(
|
||||
data: list[Any],
|
||||
x: str,
|
||||
title: str,
|
||||
extract_propertie: Callable,
|
||||
y_label: str,
|
||||
x_label: str | None = None,
|
||||
reductions: dict[str, Callable] | None = None,
|
||||
xscale: str = "linear",
|
||||
interactive: bool = True,
|
||||
image_path: Path | None = None,
|
||||
):
|
||||
"""`extract_propertie` is a founction that take a list of element and return
|
||||
a value representing the value of the list, like a median or a mean.
|
||||
"""
|
||||
raise NotImplementedError("TODO: update function to use sqlite3")
|
||||
|
||||
|
||||
# groupped = group_by(x, data, reductions=reductions)
|
||||
# properties = {k: extract_propertie(v) for k, v in groupped.items()}
|
||||
# if x_label is None:
|
||||
# x_label = x
|
||||
# x_values = list(set(filter(lambda x: x is not None, properties.keys())))
|
||||
# x_values.sort()
|
||||
# y_values = [properties[x] for x in x_values]
|
||||
#
|
||||
# plt.figure(figsize=(16, 9), dpi=80)
|
||||
# plt.plot(x_values, y_values)
|
||||
# plt.xscale(xscale)
|
||||
# # plt.ylim([-5, 105])
|
||||
# # plt.legend()
|
||||
# plt.xlabel(x_label)
|
||||
# plt.ylabel(y_label)
|
||||
# render(title, interactive, image_path)
|
||||
#
|
||||
|
||||
|
||||
def plot_apk_size(
|
||||
apk_data: list[Any],
|
||||
interactive: bool = True,
|
||||
image_path: Path | None = None,
|
||||
):
|
||||
sizes = np.array([e["total_dex_size"] for e in apk_data]) / 1024 / 1024
|
||||
sizes.sort()
|
||||
plt.figure(figsize=(16, 9), dpi=80)
|
||||
plt.bar(np.arange(len(sizes)), sizes)
|
||||
plt.ylabel("Bytecode size (MiB)")
|
||||
plt.tick_params(
|
||||
axis="x",
|
||||
which="both",
|
||||
bottom=False,
|
||||
top=False,
|
||||
labelbottom=False,
|
||||
)
|
||||
for s in range(7, 13):
|
||||
plt.axhline(y=(4**s) / 1024 / 1024, color="r", linestyle=":")
|
||||
render("Bytecode size of the apks", interactive, image_path)
|
||||
|
||||
|
||||
def plot_apk_size_hl_subset(
|
||||
apk_data: list[Any],
|
||||
subset_sha: list[str],
|
||||
title: str,
|
||||
interactive: bool = True,
|
||||
image_path: Path | None = None,
|
||||
):
|
||||
apk_data.sort(key=lambda x: x["total_dex_size"])
|
||||
sizes = (
|
||||
np.array(
|
||||
[
|
||||
e["total_dex_size"] if e["sha256"] not in subset_sha else 0
|
||||
for e in apk_data
|
||||
]
|
||||
)
|
||||
/ 1024
|
||||
/ 1024
|
||||
)
|
||||
sizes_hl = (
|
||||
np.array(
|
||||
[e["total_dex_size"] if e["sha256"] in subset_sha else 0 for e in apk_data]
|
||||
)
|
||||
/ 1024
|
||||
/ 1024
|
||||
)
|
||||
plt.figure(figsize=(16, 9), dpi=80)
|
||||
plt.bar(np.arange(len(sizes)), sizes, edgecolor="black")
|
||||
plt.bar(
|
||||
np.arange(len(sizes)), sizes_hl, color="#D55E00", hatch="x", edgecolor="black"
|
||||
)
|
||||
plt.ylabel("Bytecode size (MiB)")
|
||||
plt.tick_params(
|
||||
axis="x",
|
||||
which="both",
|
||||
bottom=False,
|
||||
top=False,
|
||||
labelbottom=False,
|
||||
)
|
||||
for s in range(7, 13):
|
||||
plt.axhline(y=(4**s) / 1024 / 1024, color="r", linestyle=":")
|
||||
render(title, interactive, image_path)
|
1129
rasta_data_manipulation/rasta_triturage/cli.py
Normal file
1129
rasta_data_manipulation/rasta_triturage/cli.py
Normal file
File diff suppressed because it is too large
Load diff
5124
rasta_data_manipulation/rasta_triturage/data_set.py
Normal file
5124
rasta_data_manipulation/rasta_triturage/data_set.py
Normal file
File diff suppressed because it is too large
Load diff
199
rasta_data_manipulation/rasta_triturage/ic3.py
Normal file
199
rasta_data_manipulation/rasta_triturage/ic3.py
Normal file
|
@ -0,0 +1,199 @@
|
|||
import sqlite3
|
||||
import csv
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Optional, Any
|
||||
from matplotlib_venn import venn2 # type: ignore
|
||||
from .utils import render
|
||||
|
||||
ERROR_CARACT = (
|
||||
"error_type",
|
||||
"error",
|
||||
"msg",
|
||||
"file",
|
||||
"function",
|
||||
"level",
|
||||
"origin",
|
||||
"raised_info",
|
||||
"called_info",
|
||||
)
|
||||
ERROR_MSG = " || '|' || ".join(map(lambda s: f"COALESCE({s}, '')", ERROR_CARACT))
|
||||
|
||||
|
||||
def ic3_venn(db: Path, interactive: bool = True, image_path: Path | None = None):
|
||||
values = {
|
||||
("FAILED", "NOT_FAILED"): 0,
|
||||
("FAILED", "FAILED"): 0,
|
||||
("NOT_FAILED", "FAILED"): 0,
|
||||
}
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
for ic3_s, ic3_fork_s, n in cur.execute(
|
||||
"SELECT ex1.tool_status, ex2.tool_status, COUNT(*) "
|
||||
"FROM exec AS ex1 OUTER LEFT JOIN exec AS ex2 ON ex1.sha256 = ex2.sha256 "
|
||||
"WHERE ex1.tool_name = 'ic3' AND ex2.tool_name = 'ic3_fork' "
|
||||
"GROUP BY ex1.tool_status, ex2.tool_status"
|
||||
):
|
||||
if ic3_s == "FAILED" and ic3_fork_s == "FAILED":
|
||||
values[("FAILED", "FAILED")] += n
|
||||
elif ic3_s == "FAILED":
|
||||
values[("FAILED", "NOT_FAILED")] += n
|
||||
elif ic3_fork_s == "FAILED":
|
||||
values[("NOT_FAILED", "FAILED")] += n
|
||||
venn2(
|
||||
subsets=(
|
||||
values[("FAILED", "NOT_FAILED")],
|
||||
values[("NOT_FAILED", "FAILED")],
|
||||
values[("FAILED", "FAILED")],
|
||||
),
|
||||
set_labels=("IC3 failed", "IC3 fork failed"),
|
||||
)
|
||||
render(
|
||||
"Number of application that IC3 \nand its fork failed to analyse",
|
||||
interactive,
|
||||
image_path,
|
||||
)
|
||||
|
||||
|
||||
def ic3_errors(db: Path, file: Path | None = None):
|
||||
errors = []
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
for err in cur.execute(
|
||||
"SELECT ex1.tool_status = 'FAILED', ex2.tool_status = 'FAILED', "
|
||||
" error.tool_name, error.error, COUNT(DISTINCT error.sha256) AS cnt, "
|
||||
f" {ERROR_MSG} "
|
||||
"FROM exec AS ex1 "
|
||||
" OUTER LEFT JOIN exec AS ex2 ON ex1.sha256 = ex2.sha256 "
|
||||
" INNER JOIN error ON ex1.sha256 = error.sha256 AND error.tool_name = 'ic3_fork' "
|
||||
"WHERE ex1.tool_name = 'ic3' AND ex2.tool_name = 'ic3_fork' AND "
|
||||
" ex1.tool_status = 'FAILED' AND ex2.tool_status != 'FAILED' "
|
||||
f"GROUP BY ex1.tool_status = 'FAILED', ex2.tool_status != 'FAILED', error.tool_name, error.error, {ERROR_MSG} "
|
||||
"ORDER BY cnt DESC "
|
||||
"LIMIT 10;"
|
||||
):
|
||||
errors.append(err)
|
||||
for err in cur.execute(
|
||||
"SELECT ex1.tool_status = 'FAILED', ex2.tool_status = 'FAILED', "
|
||||
" error.tool_name, error.error, COUNT(DISTINCT error.sha256) AS cnt, "
|
||||
f" {ERROR_MSG} "
|
||||
"FROM exec AS ex1 "
|
||||
" OUTER LEFT JOIN exec AS ex2 ON ex1.sha256 = ex2.sha256 "
|
||||
" INNER JOIN error ON ex1.sha256 = error.sha256 AND error.tool_name = 'ic3_fork' "
|
||||
"WHERE ex1.tool_name = 'ic3' AND ex2.tool_name = 'ic3_fork' AND "
|
||||
" ex1.tool_status != 'FAILED' AND ex2.tool_status = 'FAILED' "
|
||||
f"GROUP BY ex1.tool_status != 'FAILED', ex2.tool_status = 'FAILED', error.tool_name, error.error, {ERROR_MSG}"
|
||||
"ORDER BY cnt DESC "
|
||||
"LIMIT 10;"
|
||||
):
|
||||
errors.append(err)
|
||||
for err in cur.execute(
|
||||
"SELECT ex1.tool_status = 'FAILED', ex2.tool_status = 'FAILED', "
|
||||
" error.tool_name, error.error, COUNT(DISTINCT error.sha256) AS cnt, "
|
||||
f" {ERROR_MSG} "
|
||||
"FROM exec AS ex1 "
|
||||
" OUTER LEFT JOIN exec AS ex2 ON ex1.sha256 = ex2.sha256 "
|
||||
" INNER JOIN error ON ex1.sha256 = error.sha256 AND error.tool_name = 'ic3_fork' "
|
||||
"WHERE ex1.tool_name = 'ic3' AND ex2.tool_name = 'ic3_fork' AND "
|
||||
" ex1.tool_status = 'FAILED' AND ex2.tool_status = 'FAILED' "
|
||||
f"GROUP BY ex1.tool_status = 'FAILED', ex2.tool_status = 'FAILED', error.tool_name, error.error, {ERROR_MSG} "
|
||||
"ORDER BY cnt DESC "
|
||||
"LIMIT 10;"
|
||||
):
|
||||
errors.append(err)
|
||||
for err in cur.execute(
|
||||
"SELECT ex1.tool_status = 'FAILED', ex2.tool_status = 'FAILED', "
|
||||
" error.tool_name, error.error, COUNT(DISTINCT error.sha256) AS cnt, "
|
||||
f" {ERROR_MSG} "
|
||||
"FROM exec AS ex1 "
|
||||
" OUTER LEFT JOIN exec AS ex2 ON ex1.sha256 = ex2.sha256 "
|
||||
" INNER JOIN error ON ex1.sha256 = error.sha256 AND error.tool_name = 'ic3' "
|
||||
"WHERE ex1.tool_name = 'ic3' AND ex2.tool_name = 'ic3_fork' AND "
|
||||
" ex1.tool_status = 'FAILED' AND ex2.tool_status != 'FAILED' "
|
||||
f"GROUP BY ex1.tool_status = 'FAILED', ex2.tool_status != 'FAILED', error.tool_name, error.error, {ERROR_MSG} "
|
||||
"ORDER BY cnt DESC "
|
||||
"LIMIT 10;"
|
||||
):
|
||||
errors.append(err)
|
||||
for err in cur.execute(
|
||||
"SELECT ex1.tool_status = 'FAILED', ex2.tool_status = 'FAILED', "
|
||||
" error.tool_name, error.error, COUNT(DISTINCT error.sha256) AS cnt, "
|
||||
f" {ERROR_MSG} "
|
||||
"FROM exec AS ex1 "
|
||||
" OUTER LEFT JOIN exec AS ex2 ON ex1.sha256 = ex2.sha256 "
|
||||
" INNER JOIN error ON ex1.sha256 = error.sha256 AND error.tool_name = 'ic3' "
|
||||
"WHERE ex1.tool_name = 'ic3' AND ex2.tool_name = 'ic3_fork' AND "
|
||||
" ex1.tool_status != 'FAILED' AND ex2.tool_status = 'FAILED' "
|
||||
f"GROUP BY ex1.tool_status != 'FAILED', ex2.tool_status = 'FAILED', error.tool_name, error.error, {ERROR_MSG} "
|
||||
"ORDER BY cnt DESC "
|
||||
"LIMIT 10;"
|
||||
):
|
||||
errors.append(err)
|
||||
for err in cur.execute(
|
||||
"SELECT ex1.tool_status = 'FAILED', ex2.tool_status = 'FAILED', "
|
||||
" error.tool_name, error.error, COUNT(DISTINCT error.sha256) AS cnt, "
|
||||
f" {ERROR_MSG} "
|
||||
"FROM exec AS ex1 "
|
||||
" OUTER LEFT JOIN exec AS ex2 ON ex1.sha256 = ex2.sha256 "
|
||||
" INNER JOIN error ON ex1.sha256 = error.sha256 AND error.tool_name = 'ic3' "
|
||||
"WHERE ex1.tool_name = 'ic3' AND ex2.tool_name = 'ic3_fork' AND "
|
||||
" ex1.tool_status = 'FAILED' AND ex2.tool_status = 'FAILED' "
|
||||
f"GROUP BY ex1.tool_status = 'FAILED', ex2.tool_status = 'FAILED', error.tool_name, error.error, {ERROR_MSG} "
|
||||
"ORDER BY cnt DESC "
|
||||
"LIMIT 10;"
|
||||
):
|
||||
errors.append(err)
|
||||
if file is None:
|
||||
fp = sys.stdout
|
||||
else:
|
||||
fp = file.open("w")
|
||||
writer = csv.DictWriter(
|
||||
fp,
|
||||
fieldnames=[
|
||||
"ic3 failed",
|
||||
"ic3 fork failed",
|
||||
"tool",
|
||||
"error",
|
||||
"occurence",
|
||||
"msg",
|
||||
],
|
||||
)
|
||||
writer.writeheader()
|
||||
for err in map(rewrite_msg, errors):
|
||||
writer.writerow(
|
||||
{
|
||||
k: v
|
||||
for k, v in zip(
|
||||
[
|
||||
"ic3 failed",
|
||||
"ic3 fork failed",
|
||||
"tool",
|
||||
"error",
|
||||
"msg",
|
||||
"occurence",
|
||||
],
|
||||
err,
|
||||
)
|
||||
}
|
||||
)
|
||||
if file is not None:
|
||||
fp.close()
|
||||
|
||||
|
||||
def rewrite_msg(
|
||||
err: tuple[int, int, str, str, int, str]
|
||||
) -> tuple[int, int, str, str, int, str]:
|
||||
ic3_failed, ic3_fork_failed, tool, error, occurence, msg = err
|
||||
(
|
||||
error_type,
|
||||
error,
|
||||
msg,
|
||||
file,
|
||||
function,
|
||||
level,
|
||||
origin,
|
||||
raised_info,
|
||||
called_info,
|
||||
) = map(lambda s: "" if s == "" else s + " ", msg.split("|"))
|
||||
msg = f"{level}{error}{msg}{called_info}{called_info}{file}{function}{origin}"
|
||||
return (ic3_failed, ic3_fork_failed, tool, error, occurence, msg)
|
246
rasta_data_manipulation/rasta_triturage/populate_db_apk.py
Normal file
246
rasta_data_manipulation/rasta_triturage/populate_db_apk.py
Normal file
|
@ -0,0 +1,246 @@
|
|||
import sqlite3
|
||||
import time
|
||||
import gzip
|
||||
import csv
|
||||
import datetime
|
||||
import requests
|
||||
import getpass
|
||||
import dateutil.parser
|
||||
|
||||
from androguard.core.bytecodes import apk as androguard_apk
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def int_or_none(str_: str) -> int | None:
|
||||
if str_:
|
||||
return int(str_)
|
||||
else:
|
||||
return None
|
||||
|
||||
|
||||
def create_apk_table(db: Path):
|
||||
"""Create the db/table if it does not exist."""
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
if (
|
||||
cur.execute("SELECT name FROM sqlite_master WHERE name='apk'").fetchone()
|
||||
is None
|
||||
):
|
||||
cur.execute(
|
||||
(
|
||||
"CREATE TABLE apk("
|
||||
" sha256, first_seen_year, apk_size,"
|
||||
" vt_detection, min_sdk, max_sdk,"
|
||||
" target_sdk, apk_size_decile, dex_date date,"
|
||||
" pkg_name, vercode, vt_scan_date date,"
|
||||
" dex_size, added date, markets, dex_size_decile, "
|
||||
" dex_size_decile_by_year"
|
||||
")"
|
||||
)
|
||||
)
|
||||
con.commit()
|
||||
|
||||
|
||||
def get_sha_set(dataset: Path) -> set[str]:
|
||||
"""Read a set of sha256 from a file."""
|
||||
apk_set = set()
|
||||
with dataset.open() as f:
|
||||
for line in f.readlines():
|
||||
apk_set.add(line.strip())
|
||||
return apk_set
|
||||
|
||||
|
||||
def populate_from_year_and_sdk(db: Path, year_and_sdk: Path, apks: set[str]):
|
||||
"""Add to the info from year_and_sdk.csv.gz to the database
|
||||
for the apks in `apks`.
|
||||
"""
|
||||
apks_not_found = apks.copy()
|
||||
with gzip.open(year_and_sdk, "rt", newline="") as f:
|
||||
reader = csv.DictReader(f, quotechar='"')
|
||||
fieldnames = reader.fieldnames
|
||||
assert fieldnames is not None
|
||||
for row in reader:
|
||||
if row["sha256"] not in apks:
|
||||
continue
|
||||
value = {
|
||||
"sha256": row["sha256"],
|
||||
"first_seen_year": int_or_none(row["first_seen_year"]),
|
||||
"vt_detection": int_or_none(row["vt_detection"]),
|
||||
"min_sdk": int_or_none(row["min_sdk"]),
|
||||
"max_sdk": int_or_none(row["max_sdk"]),
|
||||
"target_sdk": int_or_none(row["target_sdk"]),
|
||||
"apk_size_decile": 0, # Computed at dataset generation
|
||||
"dex_size_decile": 0, # Computed by compute_dex_decile
|
||||
}
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
cur.execute(
|
||||
(
|
||||
"INSERT INTO apk ("
|
||||
" sha256, first_seen_year, vt_detection,"
|
||||
" min_sdk, max_sdk, target_sdk, apk_size_decile,"
|
||||
" dex_size_decile"
|
||||
") VALUES("
|
||||
" :sha256, :first_seen_year, :vt_detection,"
|
||||
" :min_sdk, :max_sdk, :target_sdk, :apk_size_decile,"
|
||||
" :dex_size_decile"
|
||||
");"
|
||||
),
|
||||
value,
|
||||
)
|
||||
con.commit()
|
||||
apks_not_found.remove(row["sha256"])
|
||||
for apk in apks_not_found:
|
||||
value = {
|
||||
"sha256": apk,
|
||||
"first_seen_year": None,
|
||||
"vt_detection": None,
|
||||
"min_sdk": None,
|
||||
"max_sdk": None,
|
||||
"target_sdk": None,
|
||||
"apk_size_decile": 0,
|
||||
"dex_size_decile": 0, # Computed by compute_dex_decile
|
||||
}
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
cur.execute(
|
||||
(
|
||||
"INSERT INTO apk ("
|
||||
" sha256, first_seen_year, vt_detection,"
|
||||
" min_sdk, max_sdk, target_sdk, apk_size_decile,"
|
||||
" dex_size_decile"
|
||||
") VALUES("
|
||||
" :sha256, :first_seen_year, :vt_detection,"
|
||||
" :min_sdk, :max_sdk, :target_sdk, :apk_size_decile,"
|
||||
" :dex_size_decile"
|
||||
");"
|
||||
),
|
||||
value,
|
||||
)
|
||||
con.commit()
|
||||
|
||||
|
||||
def populate_from_latest_with_added_date(
|
||||
db: Path, latest_with_added_date: Path, apks: set[str]
|
||||
):
|
||||
"""Add to the info from latest_with-added-date.csv.gz to the database
|
||||
for the apks in `apks`.
|
||||
"""
|
||||
with gzip.open(latest_with_added_date, "rt", newline="") as f:
|
||||
reader = csv.DictReader(f, quotechar='"')
|
||||
fieldnames = reader.fieldnames
|
||||
assert fieldnames is not None
|
||||
for row in reader:
|
||||
if row["sha256"] not in apks:
|
||||
continue
|
||||
value = {
|
||||
"sha256": row["sha256"],
|
||||
"apk_size": int_or_none(row["apk_size"]),
|
||||
"dex_date": datetime.datetime.fromisoformat(row["dex_date"])
|
||||
if row["dex_date"]
|
||||
else None,
|
||||
"pkg_name": row["pkg_name"],
|
||||
"vercode": int_or_none(row["vercode"]),
|
||||
"vt_scan_date": datetime.datetime.fromisoformat(row["vt_scan_date"])
|
||||
if row["vt_scan_date"]
|
||||
else None,
|
||||
"dex_size": int_or_none(
|
||||
row["dex_size"]
|
||||
), # Not necessary the right value if multiple dex are used, see 'fix_dex_size()'
|
||||
"added": dateutil.parser.isoparse(row["added"])
|
||||
if row["added"]
|
||||
else None,
|
||||
"markets": row["markets"],
|
||||
}
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
cur.execute(
|
||||
"UPDATE apk "
|
||||
"SET apk_size = :apk_size,"
|
||||
" dex_date = :dex_date,"
|
||||
" pkg_name = :pkg_name,"
|
||||
" vercode = :vercode,"
|
||||
" vt_scan_date = :vt_scan_date,"
|
||||
" dex_size = :dex_size,"
|
||||
" added = :added,"
|
||||
" markets = :markets "
|
||||
"WHERE"
|
||||
" sha256 = :sha256;",
|
||||
value,
|
||||
)
|
||||
con.commit()
|
||||
|
||||
|
||||
def download_apk(sha256: str, api_key: bytes) -> bytes:
|
||||
while True:
|
||||
resp = requests.get(
|
||||
"https://androzoo.uni.lu/api/download",
|
||||
params={
|
||||
b"apikey": api_key,
|
||||
b"sha256": sha256.encode("utf-8"),
|
||||
},
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
return resp.content
|
||||
else:
|
||||
print(resp)
|
||||
print(resp.content)
|
||||
time.sleep(1)
|
||||
|
||||
|
||||
def fix_dex_size(db: Path, apks: set[str], androzoo_key: bytes):
|
||||
"""Download the apk from androzoo, compute the total size
|
||||
of all .dex file and update the database.
|
||||
"""
|
||||
for sha256 in apks:
|
||||
apk = download_apk(sha256, androzoo_key)
|
||||
apk = androguard_apk.APK(apk, raw=True, skip_analysis=True)
|
||||
dex_size = sum(map(lambda x: len(x), apk.get_all_dex()))
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
cur.execute(
|
||||
("UPDATE apk " "SET dex_size = ? " "WHERE" " sha256 = ?;"),
|
||||
(dex_size, sha256),
|
||||
)
|
||||
con.commit()
|
||||
|
||||
|
||||
def populate_db_apk(
|
||||
db: Path,
|
||||
dataset: Path,
|
||||
year_and_sdk: Path,
|
||||
latest_with_added_date: Path,
|
||||
fix_dsize: bool,
|
||||
):
|
||||
"""Populate the database with the apk informations."""
|
||||
if fix_dsize:
|
||||
androzoo_key = (
|
||||
getpass.getpass(prompt="androzoo apikey: ").strip().encode("utf-8")
|
||||
)
|
||||
create_apk_table(db)
|
||||
apks = get_sha_set(dataset)
|
||||
populate_from_year_and_sdk(db, year_and_sdk, apks)
|
||||
populate_from_latest_with_added_date(db, latest_with_added_date, apks)
|
||||
if fix_dsize:
|
||||
fix_dex_size(db, apks, androzoo_key)
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
cur.execute(
|
||||
"UPDATE apk "
|
||||
"SET dex_size_decile = compute.decile "
|
||||
"FROM ("
|
||||
" SELECT NTILE ( 10 ) OVER ( ORDER BY dex_size ) decile, sha256 FROM apk"
|
||||
") AS compute "
|
||||
"WHERE apk.sha256 = compute.sha256;"
|
||||
)
|
||||
cur.execute(
|
||||
"UPDATE apk "
|
||||
"SET dex_size_decile_by_year = compute.decile "
|
||||
"FROM ("
|
||||
" SELECT NTILE ( 10 ) "
|
||||
" OVER ( PARTITION BY first_seen_year ORDER BY dex_size ) decile, sha256 "
|
||||
" FROM apk"
|
||||
") AS compute "
|
||||
"WHERE apk.sha256 = compute.sha256;"
|
||||
)
|
||||
con.commit()
|
186
rasta_data_manipulation/rasta_triturage/populate_db_exec.py
Normal file
186
rasta_data_manipulation/rasta_triturage/populate_db_exec.py
Normal file
|
@ -0,0 +1,186 @@
|
|||
import sqlite3
|
||||
import json
|
||||
import datetime
|
||||
from pathlib import Path
|
||||
|
||||
from .query_error import estimate_cause
|
||||
|
||||
|
||||
def create_tables(db: Path):
|
||||
"""Create the db/tables if they do not exist."""
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
cur.execute(
|
||||
(
|
||||
"CREATE TABLE IF NOT EXISTS exec ("
|
||||
" sha256, id, rev, time, kernel_cpu_time, user_cpu_time, "
|
||||
" max_rss_mem, avg_rss_mem, avg_total_mem, page_size, "
|
||||
" nb_major_page_fault, nb_minor_page_fault, nb_fs_input, "
|
||||
" nb_fs_output, nb_socket_msg_received, nb_socket_msg_sent, "
|
||||
" nb_signal_delivered, exit_status, timeout, "
|
||||
" tool_status, tool_name, date date"
|
||||
");"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"CREATE TABLE IF NOT EXISTS error ("
|
||||
" tool_name, sha256, error_type, error, msg, "
|
||||
" first_line, last_line, logfile_name, file, "
|
||||
" line, function, level, origin, raised_info, "
|
||||
" called_info, cause"
|
||||
");"
|
||||
)
|
||||
)
|
||||
con.commit()
|
||||
|
||||
|
||||
def insert_errors(cur, tool, sha256, errors):
|
||||
for error in errors:
|
||||
error["tool_name"] = tool
|
||||
error["sha256"] = sha256
|
||||
error.setdefault("error_type", None)
|
||||
error.setdefault("error", None)
|
||||
error.setdefault("msg", None)
|
||||
error.setdefault("first_line", None)
|
||||
error.setdefault("last_line", None)
|
||||
error.setdefault("logfile_name", None)
|
||||
error.setdefault("file", None)
|
||||
error.setdefault("line", None)
|
||||
error.setdefault("function", None)
|
||||
error.setdefault("level", None)
|
||||
error.setdefault("origin", None)
|
||||
error.setdefault("raised_info", None)
|
||||
if error["raised_info"] is not None:
|
||||
error["raised_info"] = 'Raised at {} in file "{}", line {}'.format(
|
||||
error["raised_info"]["function"],
|
||||
error["raised_info"]["file"],
|
||||
error["raised_info"]["line"],
|
||||
)
|
||||
error.setdefault("called_info", None)
|
||||
if error["called_info"] is not None:
|
||||
error["called_info"] = 'Called from {} in file "{}", line {}'.format(
|
||||
error["called_info"]["function"],
|
||||
error["called_info"]["file"],
|
||||
error["called_info"]["line"],
|
||||
)
|
||||
# The stack strace can be quite big without being very usefull in
|
||||
# queries
|
||||
error.pop("stack", None)
|
||||
cur.executemany(
|
||||
(
|
||||
"INSERT INTO error VALUES("
|
||||
" :tool_name, :sha256, :error_type, :error, :msg, "
|
||||
" :first_line, :last_line, :logfile_name, :file, "
|
||||
" :line, :function, :level, :origin, :raised_info, "
|
||||
" :called_info, ''"
|
||||
");"
|
||||
),
|
||||
errors,
|
||||
)
|
||||
|
||||
|
||||
def fix_error(db: Path, report_with_correct_error: Path):
|
||||
"""Infortunatly they was some errors in parsing the errors during the experiment,
|
||||
some another run was made for some pair of tool-apk to get the actual error.
|
||||
This pass was made in a different environnment (!= memory and space constraint),
|
||||
so we only replace the errors (after manual inspection, they don't seam related
|
||||
to the environnment), and keep the other values from the original experiment.
|
||||
"""
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
for path in report_with_correct_error.iterdir():
|
||||
with path.open() as f:
|
||||
exec_log = json.load(f)
|
||||
sha256 = exec_log["apk"].removesuffix(".apk")
|
||||
if (
|
||||
len(
|
||||
cur.execute(
|
||||
"SELECT * FROM exec WHERE tool_name = ? AND sha256 = ?",
|
||||
(exec_log["tool-name"], sha256),
|
||||
).fetchall()
|
||||
)
|
||||
== 1
|
||||
):
|
||||
cur.execute(
|
||||
"DELETE FROM error WHERE tool_name = ? AND sha256 = ?",
|
||||
(exec_log["tool-name"], sha256),
|
||||
)
|
||||
errors = exec_log.pop("errors", [])
|
||||
insert_errors(cur, exec_log["tool-name"], sha256, errors)
|
||||
con.commit()
|
||||
|
||||
|
||||
def populate_execution_report(db: Path, report_folder: Path):
|
||||
"""Add to database the report stored in the report_folder."""
|
||||
create_tables(db)
|
||||
i = 0
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
for path in report_folder.iterdir():
|
||||
with path.open() as f:
|
||||
exec_log = json.load(f)
|
||||
exec_log["sha256"] = exec_log["apk"].removesuffix(".apk")
|
||||
exec_log["id"] = exec_log.get("_id", None)
|
||||
exec_log["rev"] = exec_log.get("_rev", None)
|
||||
errors = exec_log.pop("errors", [])
|
||||
|
||||
exec_log["date"] = (
|
||||
datetime.datetime.fromisoformat(exec_log["date"])
|
||||
if exec_log.get("date", None)
|
||||
else None
|
||||
)
|
||||
del exec_log["apk"]
|
||||
if "_id" in exec_log:
|
||||
del exec_log["_id"]
|
||||
if "_rev" in exec_log:
|
||||
del exec_log["_rev"]
|
||||
new_exec_log = {}
|
||||
for key in exec_log:
|
||||
new_key = key.replace("-", "_")
|
||||
new_exec_log[new_key] = exec_log[key]
|
||||
for val in [
|
||||
"sha256",
|
||||
"id",
|
||||
"rev",
|
||||
"time",
|
||||
"kernel_cpu_time",
|
||||
"user_cpu_time",
|
||||
"max_rss_mem",
|
||||
"avg_rss_mem",
|
||||
"avg_total_mem",
|
||||
"page_size",
|
||||
"nb_major_page_fault",
|
||||
"nb_minor_page_fault",
|
||||
"nb_fs_input",
|
||||
"nb_fs_output",
|
||||
"nb_socket_msg_received",
|
||||
"nb_socket_msg_sent",
|
||||
"nb_signal_delivered",
|
||||
"exit_status",
|
||||
"timeout",
|
||||
"tool_status",
|
||||
"tool_name",
|
||||
"date",
|
||||
]:
|
||||
if val not in new_exec_log:
|
||||
new_exec_log[val] = None
|
||||
cur.execute(
|
||||
(
|
||||
"INSERT INTO exec VALUES("
|
||||
" :sha256, :id, :rev, :time, :kernel_cpu_time, :user_cpu_time, "
|
||||
" :max_rss_mem, :avg_rss_mem, :avg_total_mem, :page_size, "
|
||||
" :nb_major_page_fault, :nb_minor_page_fault, :nb_fs_input, "
|
||||
" :nb_fs_output, :nb_socket_msg_received, :nb_socket_msg_sent, "
|
||||
" :nb_signal_delivered, :exit_status, :timeout, "
|
||||
" :tool_status, :tool_name, :date"
|
||||
");"
|
||||
),
|
||||
new_exec_log,
|
||||
)
|
||||
insert_errors(cur, exec_log["tool-name"], exec_log["sha256"], errors)
|
||||
i += 1
|
||||
if i == 10_000:
|
||||
# Not sure how much ram would be needed to commit in one go
|
||||
con.commit()
|
||||
con.commit()
|
176
rasta_data_manipulation/rasta_triturage/populate_db_tool.py
Normal file
176
rasta_data_manipulation/rasta_triturage/populate_db_tool.py
Normal file
|
@ -0,0 +1,176 @@
|
|||
import sqlite3
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
TOOL_INFO = [
|
||||
{
|
||||
"tool_name": "adagio",
|
||||
"use_python": True,
|
||||
"use_androguard": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "amandroid",
|
||||
"use_scala": True,
|
||||
"use_soot": False,
|
||||
"use_apktool": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "anadroid",
|
||||
"use_python": True,
|
||||
"use_java": True,
|
||||
"use_scala": True,
|
||||
"use_soot": False,
|
||||
"use_apktool": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "androguard",
|
||||
"use_python": True,
|
||||
"use_androguard": True, # Duh
|
||||
},
|
||||
{
|
||||
"tool_name": "androguard_dad",
|
||||
"use_python": True,
|
||||
"use_androguard": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "apparecium",
|
||||
"use_python": True,
|
||||
"use_androguard": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "blueseal",
|
||||
"use_java": True,
|
||||
"use_soot": True,
|
||||
"use_apktool": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "dialdroid",
|
||||
"use_java": True,
|
||||
"use_soot": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "didfail",
|
||||
"use_python": True,
|
||||
"use_java": True,
|
||||
"use_soot": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "droidsafe",
|
||||
"use_python": True,
|
||||
"use_java": True,
|
||||
"use_soot": True,
|
||||
"use_apktool": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "flowdroid",
|
||||
"use_java": True,
|
||||
"use_soot": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "gator",
|
||||
"use_python": True,
|
||||
"use_java": True,
|
||||
"use_soot": True,
|
||||
"use_apktool": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "ic3",
|
||||
"use_java": True,
|
||||
"use_soot": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "ic3_fork",
|
||||
"use_java": True,
|
||||
"use_soot": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "iccta",
|
||||
"use_java": True,
|
||||
"use_soot": True,
|
||||
"use_apktool": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "mallodroid",
|
||||
"use_python": True,
|
||||
"use_androguard": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "perfchecker",
|
||||
"use_java": True,
|
||||
"use_soot": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "redexer",
|
||||
"use_ocaml": True,
|
||||
"use_ruby": True,
|
||||
"use_apktool": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "saaf",
|
||||
"use_java": True,
|
||||
"use_soot": False,
|
||||
"use_apktool": True,
|
||||
},
|
||||
{
|
||||
"tool_name": "wognsen_et_al",
|
||||
"use_python": True,
|
||||
"use_prolog": True,
|
||||
"use_apktool": True,
|
||||
},
|
||||
]
|
||||
|
||||
for line in TOOL_INFO:
|
||||
for col in [
|
||||
"use_python",
|
||||
"use_java",
|
||||
"use_scala",
|
||||
"use_ocaml",
|
||||
"use_ruby",
|
||||
"use_prolog",
|
||||
"use_soot",
|
||||
"use_androguard",
|
||||
"use_apktool",
|
||||
]:
|
||||
if col not in line:
|
||||
line[col] = False
|
||||
|
||||
|
||||
def create_tool_table(db: Path):
|
||||
"""Create the db/table if it does not exist."""
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
if (
|
||||
cur.execute("SELECT name FROM sqlite_master WHERE name='tool';").fetchone()
|
||||
is None
|
||||
):
|
||||
cur.execute(
|
||||
(
|
||||
"CREATE TABLE tool ("
|
||||
" tool_name, use_python, use_java, use_scala,"
|
||||
" use_ocaml, use_ruby, use_prolog, use_soot, "
|
||||
" use_androguard, use_apktool"
|
||||
");"
|
||||
)
|
||||
)
|
||||
con.commit()
|
||||
|
||||
|
||||
def populate_tool(
|
||||
db: Path,
|
||||
):
|
||||
"""Add to database the tool information"""
|
||||
create_tool_table(db)
|
||||
# DROP table if already exist? replace value?
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
cur.executemany(
|
||||
(
|
||||
"INSERT INTO tool VALUES("
|
||||
" :tool_name, :use_python, :use_java, :use_scala,"
|
||||
" :use_ocaml, :use_ruby, :use_prolog, :use_soot, "
|
||||
" :use_androguard, :use_apktool"
|
||||
");"
|
||||
),
|
||||
TOOL_INFO,
|
||||
)
|
||||
con.commit()
|
699
rasta_data_manipulation/rasta_triturage/query_error.py
Normal file
699
rasta_data_manipulation/rasta_triturage/query_error.py
Normal file
|
@ -0,0 +1,699 @@
|
|||
import sqlite3
|
||||
import sys
|
||||
import csv
|
||||
import matplotlib.pyplot as plt # type: ignore
|
||||
from .utils import get_list_tools, radar_chart, render
|
||||
from pathlib import Path
|
||||
from typing import Optional, Any
|
||||
|
||||
ERROR_CARACT = (
|
||||
"error_type",
|
||||
"error",
|
||||
"msg",
|
||||
"file",
|
||||
"function",
|
||||
"level",
|
||||
"origin",
|
||||
"raised_info",
|
||||
"called_info",
|
||||
)
|
||||
|
||||
# Query that remove identical error that occure multiple times on the same execution
|
||||
DISTINCT_ERRORS = (
|
||||
"("
|
||||
f" SELECT DISTINCT tool_name, sha256, {', '.join(ERROR_CARACT)}"
|
||||
" FROM error"
|
||||
") AS distinct_error"
|
||||
)
|
||||
DISTINCT_ERROR_CLASS = (
|
||||
"("
|
||||
f" SELECT DISTINCT tool_name, sha256, error, error_type"
|
||||
" FROM error"
|
||||
") AS distinct_error"
|
||||
)
|
||||
DISTINCT_CAUSES = (
|
||||
"("
|
||||
" SELECT DISTINCT tool_name, sha256, cause"
|
||||
" FROM error"
|
||||
") AS distinct_cause"
|
||||
)
|
||||
|
||||
|
||||
def estimate_cause(db: Path):
|
||||
"""Estimate the cause of an error to easier grouping."""
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
cur.execute("UPDATE error SET cause = '';")
|
||||
con.commit()
|
||||
# brut.androlib is package defined in apktool
|
||||
# 'Expected: 0x001c0001, got: 0x00000000' errors are always
|
||||
# part of an apktool stacktrace:
|
||||
# SELECT COUNT(*) FROM error e1
|
||||
# WHERE e1.tool_name = '${tool}' AND
|
||||
# e1.msg = 'Expected: 0x001c0001, got: 0x00000000' AND
|
||||
# e1.sha256 NOT IN (
|
||||
# SELECT e2.sha256 FROM error e2
|
||||
# WHERE e2.tool_name = '${tool}' AND
|
||||
# e2.msg LIKE '%Could not decode arsc file%'
|
||||
# )
|
||||
# is always 0"
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'apktool' "
|
||||
"WHERE error = 'brut.androlib.AndrolibException' OR "
|
||||
" error LIKE 'brut.androlib.err.%' OR "
|
||||
" msg = 'Expected: 0x001c0001, got: 0x00000000' OR "
|
||||
" msg LIKE '%brut.androlib.AndrolibException: Could not decode arsc file%' OR "
|
||||
" msg LIKE 'bad magic value: %' OR "
|
||||
" error = 'brut.androlib.err.UndefinedResObject';"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'memory' "
|
||||
"WHERE error = 'java.lang.StackOverflowError' OR "
|
||||
" error = 'java.lang.OutOfMemoryError' OR "
|
||||
" msg LIKE '%java.lang.OutOfMemoryError%' OR "
|
||||
" msg LIKE '%java.lang.StackOverflowError%' OR "
|
||||
" msg = 'Stack overflow';"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'soot' "
|
||||
"WHERE msg LIKE ? OR "
|
||||
" msg LIKE '%No call graph present in Scene. Maybe you want Whole Program mode (-w)%' OR "
|
||||
" msg LIKE '%There were exceptions during IFDS analysis. Exiting.%' OR " # More hero than soot?
|
||||
" msg = 'Could not find method' OR "
|
||||
" msg = 'No sources found, aborting analysis' OR "
|
||||
" msg = 'No sources or sinks found, aborting analysis' OR "
|
||||
" msg = 'Only phantom classes loaded, skipping analysis...';"
|
||||
),
|
||||
(
|
||||
"%RefType java.lang.Object not loaded. If you tried to get the RefType of a library class, did you call loadNecessaryClasses()? Otherwise please check Soot's classpath.%",
|
||||
),
|
||||
)
|
||||
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'index error' "
|
||||
"WHERE error = 'IndexError' OR "
|
||||
" msg = 'java.lang.ArrayIndexOutOfBoundsException' OR "
|
||||
" (error_type = 'Python' AND error = 'KeyError') OR "
|
||||
" error = 'java.lang.IndexOutOfBoundsException' OR "
|
||||
" error = 'java.lang.ArrayIndexOutOfBoundsException' OR "
|
||||
" msg LIKE 'java.lang.ArrayIndexOutOfBoundsException:%';"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'arithmetique' "
|
||||
"WHERE error = 'java.lang.ArithmeticException';"
|
||||
)
|
||||
)
|
||||
cur.execute("UPDATE error SET cause = 'jasmin' WHERE error = 'jas.jasError';")
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'storage' "
|
||||
"WHERE msg = 'No space left on device' OR "
|
||||
" msg LIKE 'Error copying file: %' OR "
|
||||
" msg = 'java.io.IOException: No space left on device';"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'redexe pattern maching failed' "
|
||||
"WHERE msg = 'File \"src/ext/logging.ml\", line 712, characters 12-17: Pattern matching failed';"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'null pointer' "
|
||||
"WHERE error = 'java.lang.NullPointerException' OR "
|
||||
" msg LIKE ? OR "
|
||||
" msg LIKE 'undefined method % for nil:NilClass (NoMethodError)';"
|
||||
),
|
||||
("'NoneType' object has no attribute %",),
|
||||
)
|
||||
# Soot ?
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'unknown error in thread' "
|
||||
"WHERE msg = 'Worker thread execution failed: null';"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'timeout' "
|
||||
"WHERE error = 'java.util.concurrent.TimeoutException';"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'file name too long' "
|
||||
"WHERE msg = 'File name too long';"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'encoding' "
|
||||
"WHERE error = 'UnicodeEncodeError';"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'smali' "
|
||||
"WHERE error LIKE 'org.jf.dexlib2.%' OR error LIKE 'org.jf.util.%';"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'redexer dex parser' "
|
||||
"WHERE msg LIKE 'Dex.Wrong_dex(\"%\")';"
|
||||
)
|
||||
)
|
||||
cur.execute(
|
||||
(
|
||||
"UPDATE error "
|
||||
"SET cause = 'bytecode not found' "
|
||||
"WHERE msg LIKE 'No method source set for method %' OR "
|
||||
" msg LIKE '% is an system library method.' OR "
|
||||
" msg LIKE '% is an unknown method.';"
|
||||
)
|
||||
)
|
||||
con.commit()
|
||||
# Default
|
||||
# default = " || '|' || ".join(map(lambda s: f"COALESCE({s}, '')", ERROR_CARACT))
|
||||
# cur.execute(f"UPDATE error SET cause = {default} WHERE cause = '';")
|
||||
cur.execute("UPDATE error SET cause = 'other' WHERE cause = '';")
|
||||
con.commit()
|
||||
|
||||
|
||||
def radar_cause_estimation(
|
||||
db: Path,
|
||||
tools: list[str] | None,
|
||||
interactive: bool,
|
||||
folder: Path | None,
|
||||
):
|
||||
# estimate_cause(db)
|
||||
if tools is None:
|
||||
tools = get_list_tools(db)
|
||||
|
||||
with sqlite3.connect(db, timeout=60) as con:
|
||||
cur = con.cursor()
|
||||
causes = [
|
||||
v for v, in cur.execute("SELECT DISTINCT cause FROM error;").fetchall()
|
||||
]
|
||||
for tool in tools:
|
||||
print(f"tool: {tool}")
|
||||
for cause, count in cur.execute(
|
||||
(
|
||||
"SELECT cause, COUNT(*) AS cnt "
|
||||
"FROM error "
|
||||
"WHERE tool_name = ? "
|
||||
"GROUP BY cause "
|
||||
"ORDER BY cnt DESC LIMIT 10;"
|
||||
),
|
||||
(tool,),
|
||||
):
|
||||
print(f"{count: 6}: {cause}")
|
||||
print()
|
||||
|
||||
values = []
|
||||
labels = tools
|
||||
for tool in tools:
|
||||
vals = [0 for _ in causes]
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
for cause, cnt in cur.execute(
|
||||
(
|
||||
"SELECT distinct_cause.cause, COUNT(*) AS cnt "
|
||||
f"FROM {DISTINCT_CAUSES} "
|
||||
"WHERE distinct_cause.cause != '' AND distinct_cause.tool_name = ? "
|
||||
"GROUP BY distinct_cause.cause;"
|
||||
),
|
||||
(tool,),
|
||||
):
|
||||
print(f"{tool=}, {cause=}, {cnt=}")
|
||||
if cause in causes:
|
||||
vals[causes.index(cause)] = cnt
|
||||
print(f"{tool=}, {vals=}")
|
||||
radar_chart(
|
||||
causes, [vals], [tool], f"Causes of error for {tool}", interactive, folder
|
||||
)
|
||||
values.append(vals)
|
||||
radar_chart(causes, values, labels, "Causes of error", interactive, folder)
|
||||
|
||||
|
||||
def get_common_errors(
|
||||
db: Path,
|
||||
tool: Optional[str] = None,
|
||||
status: Optional[str] = None,
|
||||
use_androguard: Optional[bool] = None,
|
||||
use_java: Optional[bool] = None,
|
||||
use_prolog: Optional[bool] = None,
|
||||
use_ruby: Optional[bool] = None,
|
||||
use_soot: Optional[bool] = None,
|
||||
use_apktool: Optional[bool] = None,
|
||||
use_ocaml: Optional[bool] = None,
|
||||
use_python: Optional[bool] = None,
|
||||
use_scala: Optional[bool] = None,
|
||||
folder: Optional[Path] = None,
|
||||
limit: int = 10,
|
||||
):
|
||||
"""Get the most common errors"""
|
||||
args: dict[str, Any] = {"limit": limit}
|
||||
clauses = []
|
||||
if tool is not None:
|
||||
clauses.append("(distinct_error.tool_name = :tool)")
|
||||
args["tool"] = tool
|
||||
if status is not None:
|
||||
clauses.append("(exec.tool_status = :tool_status)")
|
||||
args["tool_status"] = status
|
||||
|
||||
if use_java is not None:
|
||||
clauses.append("(tool.use_java = :use_java)")
|
||||
args["use_java"] = use_java
|
||||
if use_prolog is not None:
|
||||
clauses.append("(tool.use_prolog = :use_prolog)")
|
||||
args["use_prolog"] = use_prolog
|
||||
if use_ruby is not None:
|
||||
clauses.append("(tool.use_ruby = :use_ruby)")
|
||||
args["use_ruby"] = use_ruby
|
||||
if use_soot is not None:
|
||||
clauses.append("(tool.use_soot = :use_soot)")
|
||||
args["use_soot"] = use_soot
|
||||
if use_apktool is not None:
|
||||
clauses.append("(tool.use_apktool = :use_apktool)")
|
||||
args["use_apktool"] = use_apktool
|
||||
if use_ocaml is not None:
|
||||
clauses.append("(tool.use_ocaml = :use_ocaml)")
|
||||
args["use_ocaml"] = use_ocaml
|
||||
if use_python is not None:
|
||||
clauses.append("(tool.use_python = :use_python)")
|
||||
args["use_python"] = use_python
|
||||
if use_scala is not None:
|
||||
clauses.append("(tool.use_scala = :use_scala)")
|
||||
args["use_scala"] = use_scala
|
||||
where_clause = ""
|
||||
if clauses:
|
||||
where_clause = f"WHERE {' AND '.join(clauses)}"
|
||||
|
||||
# print(
|
||||
# (
|
||||
# f"SELECT COUNT(*) AS cnt, {', '.join(ERROR_CARACT)} \n"
|
||||
# f"FROM {DISTINCT_ERRORS} \n"
|
||||
# "INNER JOIN tool ON distinct_error.tool_name = tool.tool_name \n"
|
||||
# "INNER JOIN exec ON \n"
|
||||
# " distinct_error.tool_name = exec.tool_name AND \n"
|
||||
# " distinct_error.sha256 = exec.sha256 \n"
|
||||
# f"{where_clause}\n"
|
||||
# f"GROUP BY {', '.join(ERROR_CARACT)} \n"
|
||||
# "ORDER BY cnt DESC LIMIT :limit;\n"
|
||||
# )
|
||||
# )
|
||||
# print(args)
|
||||
|
||||
if folder is None:
|
||||
out = sys.stdout
|
||||
else:
|
||||
# Generate filename
|
||||
features = [
|
||||
use_androguard,
|
||||
use_java,
|
||||
use_prolog,
|
||||
use_ruby,
|
||||
use_soot,
|
||||
use_apktool,
|
||||
use_ocaml,
|
||||
use_python,
|
||||
use_scala,
|
||||
]
|
||||
|
||||
if tool is None:
|
||||
tool_str = ""
|
||||
else:
|
||||
tool_str = f"_for_{tool}"
|
||||
if status is None:
|
||||
status_str = ""
|
||||
else:
|
||||
status_str = f"_when_{status}"
|
||||
if all(map(lambda x: x is None, features)):
|
||||
features_str = ""
|
||||
else:
|
||||
features_str = "_using"
|
||||
if use_androguard:
|
||||
features_str += "_androguard"
|
||||
if use_java:
|
||||
features_str += "_java"
|
||||
if use_prolog:
|
||||
features_str += "_prolog"
|
||||
if use_ruby:
|
||||
features_str += "_ruby"
|
||||
if use_soot:
|
||||
features_str += "_soot"
|
||||
if use_apktool:
|
||||
features_str += "_apktool"
|
||||
if use_ocaml:
|
||||
features_str += "_ocaml"
|
||||
if use_python:
|
||||
features_str += "_python"
|
||||
if use_scala:
|
||||
features_str += "_scala"
|
||||
|
||||
name = f"{limit}_most_common_errors{tool_str}{status_str}{features_str}.csv"
|
||||
# make sure the folder exist
|
||||
folder.mkdir(parents=True, exist_ok=True)
|
||||
out = (folder / name).open("w")
|
||||
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
writer = csv.DictWriter(out, fieldnames=["error", "msg", "count"])
|
||||
writer.writeheader()
|
||||
for row in cur.execute(
|
||||
(
|
||||
f"SELECT COUNT(*) AS cnt, {', '.join(ERROR_CARACT)} "
|
||||
f"FROM {DISTINCT_ERRORS} "
|
||||
"INNER JOIN tool ON distinct_error.tool_name = tool.tool_name "
|
||||
"INNER JOIN exec ON "
|
||||
" distinct_error.tool_name = exec.tool_name AND "
|
||||
" distinct_error.sha256 = exec.sha256 "
|
||||
f"{where_clause}"
|
||||
f"GROUP BY {', '.join(ERROR_CARACT)} "
|
||||
"ORDER BY cnt DESC LIMIT :limit;"
|
||||
),
|
||||
args,
|
||||
):
|
||||
row_d = {k: v for (k, v) in zip(("cnt", *ERROR_CARACT), row)}
|
||||
writer.writerow(reduce_error_row(row_d))
|
||||
if folder is not None:
|
||||
out.close()
|
||||
|
||||
|
||||
def reduce_error_row(row: dict[str, Any]) -> dict[str, Any]:
|
||||
"""Reduce an error from an sqlite row to a simpler row for svg."""
|
||||
new_row = {}
|
||||
new_row["error"] = row["error"]
|
||||
msg = row["msg"]
|
||||
error = row["error"]
|
||||
if error:
|
||||
error += " "
|
||||
else:
|
||||
error = ""
|
||||
if msg:
|
||||
msg += " "
|
||||
else:
|
||||
msg = ""
|
||||
file = row["file"]
|
||||
if file:
|
||||
file += " "
|
||||
else:
|
||||
file = ""
|
||||
function = row["function"]
|
||||
if function:
|
||||
function += " "
|
||||
else:
|
||||
function = ""
|
||||
level = row["level"]
|
||||
if level:
|
||||
level += " "
|
||||
else:
|
||||
level = ""
|
||||
origin = row["origin"]
|
||||
if origin:
|
||||
origin += " "
|
||||
else:
|
||||
origin = ""
|
||||
raised_info = row["raised_info"]
|
||||
if raised_info:
|
||||
raised_info += " "
|
||||
else:
|
||||
raised_info = ""
|
||||
called_info = row["called_info"]
|
||||
if called_info:
|
||||
called_info += " "
|
||||
else:
|
||||
called_info = ""
|
||||
new_row[
|
||||
"msg"
|
||||
] = f"{level}{error}{msg}{called_info}{called_info}{file}{function}{origin}"
|
||||
|
||||
new_row["count"] = row["cnt"]
|
||||
return new_row
|
||||
|
||||
|
||||
def get_common_error_classes(
|
||||
db: Path,
|
||||
tool: Optional[str] = None,
|
||||
status: Optional[str] = None,
|
||||
use_androguard: Optional[bool] = None,
|
||||
use_java: Optional[bool] = None,
|
||||
use_prolog: Optional[bool] = None,
|
||||
use_ruby: Optional[bool] = None,
|
||||
use_soot: Optional[bool] = None,
|
||||
use_apktool: Optional[bool] = None,
|
||||
use_ocaml: Optional[bool] = None,
|
||||
use_python: Optional[bool] = None,
|
||||
use_scala: Optional[bool] = None,
|
||||
folder: Optional[Path] = None,
|
||||
limit: int = 10,
|
||||
):
|
||||
"""Get the most common errors classes"""
|
||||
args: dict[str, Any] = {"limit": limit}
|
||||
clauses = []
|
||||
if tool is not None:
|
||||
clauses.append("(distinct_error.tool_name = :tool)")
|
||||
args["tool"] = tool
|
||||
if status is not None:
|
||||
clauses.append("(exec.tool_status = :tool_status)")
|
||||
args["tool_status"] = status
|
||||
|
||||
if use_java is not None:
|
||||
clauses.append("(tool.use_java = :use_java)")
|
||||
args["use_java"] = use_java
|
||||
if use_prolog is not None:
|
||||
clauses.append("(tool.use_prolog = :use_prolog)")
|
||||
args["use_prolog"] = use_prolog
|
||||
if use_ruby is not None:
|
||||
clauses.append("(tool.use_ruby = :use_ruby)")
|
||||
args["use_ruby"] = use_ruby
|
||||
if use_soot is not None:
|
||||
clauses.append("(tool.use_soot = :use_soot)")
|
||||
args["use_soot"] = use_soot
|
||||
if use_apktool is not None:
|
||||
clauses.append("(tool.use_apktool = :use_apktool)")
|
||||
args["use_apktool"] = use_apktool
|
||||
if use_ocaml is not None:
|
||||
clauses.append("(tool.use_ocaml = :use_ocaml)")
|
||||
args["use_ocaml"] = use_ocaml
|
||||
if use_python is not None:
|
||||
clauses.append("(tool.use_python = :use_python)")
|
||||
args["use_python"] = use_python
|
||||
if use_scala is not None:
|
||||
clauses.append("(tool.use_scala = :use_scala)")
|
||||
args["use_scala"] = use_scala
|
||||
where_clause = ""
|
||||
if clauses:
|
||||
where_clause = f"WHERE {' AND '.join(clauses)}"
|
||||
|
||||
if folder is None:
|
||||
out = sys.stdout
|
||||
else:
|
||||
# Generate filename
|
||||
features = [
|
||||
use_androguard,
|
||||
use_java,
|
||||
use_prolog,
|
||||
use_ruby,
|
||||
use_soot,
|
||||
use_apktool,
|
||||
use_ocaml,
|
||||
use_python,
|
||||
use_scala,
|
||||
]
|
||||
|
||||
if tool is None:
|
||||
tool_str = ""
|
||||
else:
|
||||
tool_str = f"_for_{tool}"
|
||||
if status is None:
|
||||
status_str = ""
|
||||
else:
|
||||
status_str = f"_when_{status}"
|
||||
if all(map(lambda x: x is None, features)):
|
||||
features_str = ""
|
||||
else:
|
||||
features_str = "_using"
|
||||
if use_androguard:
|
||||
features_str += "_androguard"
|
||||
if use_java:
|
||||
features_str += "_java"
|
||||
if use_prolog:
|
||||
features_str += "_prolog"
|
||||
if use_ruby:
|
||||
features_str += "_ruby"
|
||||
if use_soot:
|
||||
features_str += "_soot"
|
||||
if use_apktool:
|
||||
features_str += "_apktool"
|
||||
if use_ocaml:
|
||||
features_str += "_ocaml"
|
||||
if use_python:
|
||||
features_str += "_python"
|
||||
if use_scala:
|
||||
features_str += "_scala"
|
||||
|
||||
name = f"{limit}_most_common_errors_classes{tool_str}{status_str}{features_str}.csv"
|
||||
# make sure the folder exist
|
||||
folder.mkdir(parents=True, exist_ok=True)
|
||||
out = (folder / name).open("w")
|
||||
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
writer = csv.DictWriter(out, fieldnames=["type", "error", "count"])
|
||||
writer.writeheader()
|
||||
for row in cur.execute(
|
||||
(
|
||||
f"SELECT COUNT(*) AS cnt, distinct_error.error, distinct_error.error_type "
|
||||
f"FROM {DISTINCT_ERROR_CLASS} "
|
||||
"INNER JOIN tool ON distinct_error.tool_name = tool.tool_name "
|
||||
"INNER JOIN exec ON "
|
||||
" distinct_error.tool_name = exec.tool_name AND "
|
||||
" distinct_error.sha256 = exec.sha256 "
|
||||
f"{where_clause} "
|
||||
f"GROUP BY distinct_error.error, distinct_error.error_type "
|
||||
"ORDER BY cnt DESC LIMIT :limit;"
|
||||
),
|
||||
args,
|
||||
):
|
||||
row_d = {k: v for (k, v) in zip(("count", "error", "type"), row)}
|
||||
writer.writerow(row_d)
|
||||
if folder is not None:
|
||||
out.close()
|
||||
|
||||
|
||||
def get_nb_error(
|
||||
db: Path,
|
||||
folder: Optional[Path] = None,
|
||||
):
|
||||
NB_ERR = (
|
||||
"("
|
||||
"SELECT "
|
||||
" exec_id.tool_name, exec_id.sha256, COUNT(error._rowid_) AS nb_err "
|
||||
"FROM ("
|
||||
" (SELECT tool_name FROM tool) CROSS JOIN (SELECT sha256 FROM apk)"
|
||||
") AS exec_id LEFT JOIN error "
|
||||
"ON exec_id.tool_name=error.tool_name AND exec_id.sha256=error.sha256 "
|
||||
"GROUP BY exec_id.tool_name, exec_id.sha256"
|
||||
") AS nb_err"
|
||||
)
|
||||
data = {}
|
||||
tools = set()
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
for tool, status, avg, variance in cur.execute(
|
||||
"SELECT nb_err.tool_name, exec.tool_status, AVG(nb_err.nb_err), "
|
||||
" AVG(nb_err.nb_err*nb_err.nb_err) - AVG(nb_err.nb_err)*AVG(nb_err.nb_err) "
|
||||
f"FROM {NB_ERR} "
|
||||
"INNER JOIN exec ON nb_err.tool_name = exec.tool_name AND nb_err.sha256 = exec.sha256 "
|
||||
"GROUP BY nb_err.tool_name, exec.tool_status;"
|
||||
):
|
||||
tools.add(tool)
|
||||
data[(tool, status)] = (avg, variance)
|
||||
fieldnames = list(tools)
|
||||
fieldnames.sort()
|
||||
fieldnames = ["", *fieldnames]
|
||||
if folder is None:
|
||||
fd = sys.stdout
|
||||
else:
|
||||
fd = (folder / "average_number_of_error_by_exec.csv").open("w")
|
||||
writer = csv.DictWriter(fd, fieldnames=fieldnames)
|
||||
writer.writeheader()
|
||||
for status in ("FINISHED", "FAILED", "TIMEOUT"):
|
||||
row = {"": status}
|
||||
for tool in tools:
|
||||
row[tool] = round(data.get((tool, status), (0, 0))[0], 2)
|
||||
writer.writerow(row)
|
||||
row = {"": "standard deviation"}
|
||||
for tool in tools:
|
||||
row[tool] = round(data.get((tool, status), (0, 0))[1] ** (1 / 2), 2)
|
||||
writer.writerow(row)
|
||||
if folder is not None:
|
||||
fd.close()
|
||||
|
||||
|
||||
def error_type_repartition(
|
||||
db: Path, interactive: bool = True, folder: Optional[Path] = None
|
||||
):
|
||||
data: dict[str, dict[str, int]] = {}
|
||||
total: dict[str, int] = {}
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
for tool, err, n in cur.execute(
|
||||
"SELECT tool_name, error, COUNT(*) FROM error GROUP BY tool_name, error;"
|
||||
):
|
||||
if tool not in data:
|
||||
data[tool] = {}
|
||||
total[tool] = 0
|
||||
if err is not None and err != "":
|
||||
data[tool][err] = n
|
||||
for tool, n in cur.execute(
|
||||
"SELECT tool_name, COUNT(*) FROM error WHERE error IS NOT NULL AND error != '' GROUP BY tool_name;"
|
||||
):
|
||||
total[tool] = n
|
||||
errors = set()
|
||||
N = 3
|
||||
for tool in data:
|
||||
for err in sorted(
|
||||
[err for err in data[tool]], key=lambda err: data[tool][err], reverse=True
|
||||
)[:N]:
|
||||
# TODO Check of > 10%?
|
||||
errors.add(err)
|
||||
tools = sorted(data.keys())
|
||||
errors_l = sorted(errors)
|
||||
values = [
|
||||
[
|
||||
data[tool].get(err, 0) * 100 / total[tool] if total[tool] != 0 else 0
|
||||
for tool in tools
|
||||
]
|
||||
for err in errors_l
|
||||
]
|
||||
plt.figure(figsize=(22, 20))
|
||||
im = plt.imshow(values, cmap="Greys")
|
||||
cbar = plt.colorbar(im)
|
||||
cbar.ax.set_ylabel(
|
||||
"% of the error type among the error raised by the tool",
|
||||
rotation=-90,
|
||||
va="bottom",
|
||||
)
|
||||
|
||||
import numpy as np
|
||||
|
||||
plt.xticks(np.arange(len(tools)), labels=tools, rotation=80)
|
||||
plt.yticks(np.arange(len(errors_l)), labels=errors_l)
|
||||
plt.xticks(np.arange(len(tools) + 1) - 0.5, minor=True)
|
||||
plt.yticks(np.arange(len(errors_l) + 1) - 0.5, minor=True)
|
||||
plt.grid(which="minor", color="w", linestyle="-", linewidth=3)
|
||||
plt.tick_params(which="minor", bottom=False, left=False)
|
||||
plt.title("Repartition of error types among tools")
|
||||
# plt.figure().set_figheight(10)
|
||||
render(
|
||||
"Repartition of error types among tools",
|
||||
interactive,
|
||||
folder,
|
||||
tight_layout=False,
|
||||
)
|
62
rasta_data_manipulation/rasta_triturage/ressources.py
Normal file
62
rasta_data_manipulation/rasta_triturage/ressources.py
Normal file
|
@ -0,0 +1,62 @@
|
|||
import sqlite3
|
||||
import sys
|
||||
import csv
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
|
||||
def get_ressource(
|
||||
db: Path,
|
||||
folder: Optional[Path] = None,
|
||||
):
|
||||
data_time = {}
|
||||
data_mem = {}
|
||||
tools = set()
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
for tool, status, avg_time, var_time, avg_mem, var_mem in cur.execute(
|
||||
"SELECT tool_name, exec.tool_status, "
|
||||
" AVG(time), AVG(time*time) - AVG(time)*AVG(time), "
|
||||
" AVG(max_rss_mem), AVG(max_rss_mem*max_rss_mem) - AVG(max_rss_mem)*AVG(max_rss_mem) "
|
||||
"FROM exec "
|
||||
"GROUP BY tool_name, tool_status;"
|
||||
):
|
||||
tools.add(tool)
|
||||
if var_time is None:
|
||||
var_time = 0
|
||||
if var_mem is None:
|
||||
var_mem = 0
|
||||
data_time[(tool, status)] = (avg_time, var_time ** (1 / 2))
|
||||
data_mem[(tool, status)] = (avg_mem, var_mem ** (1 / 2))
|
||||
fieldnames = list(tools)
|
||||
fieldnames.sort()
|
||||
fieldnames = ["", *fieldnames]
|
||||
if folder is None:
|
||||
fd_time = sys.stdout
|
||||
fd_mem = sys.stdout
|
||||
else:
|
||||
fd_time = (folder / "average_time.csv").open("w")
|
||||
fd_mem = (folder / "average_mem.csv").open("w")
|
||||
writer_time = csv.DictWriter(fd_time, fieldnames=fieldnames)
|
||||
writer_mem = csv.DictWriter(fd_mem, fieldnames=fieldnames)
|
||||
writer_time.writeheader()
|
||||
writer_mem.writeheader()
|
||||
for status in ("FINISHED", "FAILED", "TIMEOUT"):
|
||||
row_time = {"": status}
|
||||
row_mem = {"": status}
|
||||
for tool in tools:
|
||||
row_time[tool] = round(data_time.get((tool, status), (0, 0))[0], 2)
|
||||
row_mem[tool] = round(data_mem.get((tool, status), (0, 0))[0], 2)
|
||||
writer_time.writerow(row_time)
|
||||
writer_mem.writerow(row_mem)
|
||||
row_time = {"": "standard deviation"}
|
||||
row_mem = {"": "standard deviation"}
|
||||
for tool in tools:
|
||||
row_time[tool] = round(data_time.get((tool, status), (0, 0))[1], 2)
|
||||
row_mem[tool] = round(data_mem.get((tool, status), (0, 0))[1], 2)
|
||||
writer_time.writerow(row_time)
|
||||
writer_mem.writerow(row_mem)
|
||||
if folder is not None:
|
||||
fd_time.close()
|
||||
fd_mem.close()
|
446
rasta_data_manipulation/rasta_triturage/status.py
Normal file
446
rasta_data_manipulation/rasta_triturage/status.py
Normal file
|
@ -0,0 +1,446 @@
|
|||
"""
|
||||
Plots related to the tool status.
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
|
||||
import sqlite3
|
||||
from pathlib import Path
|
||||
from matplotlib import pyplot as plt # type: ignore
|
||||
from typing import Any, Callable, Optional
|
||||
from .utils import (
|
||||
render,
|
||||
DENSE_DASH,
|
||||
DENSE_DOT,
|
||||
get_list_tools,
|
||||
plot_generic,
|
||||
MARKERS,
|
||||
COLORS,
|
||||
)
|
||||
from .populate_db_tool import TOOL_INFO
|
||||
|
||||
TOOL_LINE_STYLE = {
|
||||
tool_info["tool_name"]: DENSE_DOT if tool_info["use_soot"] else DENSE_DASH
|
||||
for tool_info in TOOL_INFO
|
||||
}
|
||||
|
||||
|
||||
def plot_status_by_tool(
|
||||
db: Path,
|
||||
interactive: bool = True,
|
||||
image_path: Path | None = None,
|
||||
tools: list[str] | None = None,
|
||||
title: str = "Exit Status",
|
||||
):
|
||||
"""Plot the repartition of status by tools."""
|
||||
if tools is None:
|
||||
tools = get_list_tools(db)
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
tools_list_format = f"({','.join(['?' for _ in tools])})"
|
||||
nb_apk = cur.execute("SELECT COUNT(*) FROM apk;").fetchone()[0]
|
||||
status = cur.execute(
|
||||
(
|
||||
"SELECT tool_name, tool_status, COUNT(sha256) "
|
||||
"FROM exec "
|
||||
f"WHERE tool_name IN {tools_list_format}"
|
||||
"GROUP BY tool_name, tool_status;"
|
||||
),
|
||||
tools,
|
||||
).fetchall()
|
||||
occurences = {}
|
||||
for tool, stat, occurence in status:
|
||||
occurences[(tool, stat)] = occurence
|
||||
# tools.sort(key=lambda t: occurences.get((t, "FINISHED"), 0), reverse=True)
|
||||
tools.sort()
|
||||
|
||||
values = {
|
||||
"Finished": np.zeros(len(tools)),
|
||||
"Time Out": np.zeros(len(tools)),
|
||||
"Other": np.zeros(len(tools)),
|
||||
"Failed": np.zeros(len(tools)),
|
||||
}
|
||||
colors = {
|
||||
"Finished": "#009E73",
|
||||
"Time Out": "#56B4E9",
|
||||
"Failed": "#D55E00",
|
||||
"Other": "#555555", # TODO: better color
|
||||
}
|
||||
hatch = {
|
||||
"Finished": "/",
|
||||
"Time Out": "x",
|
||||
"Failed": "\\",
|
||||
"Other": ".",
|
||||
}
|
||||
for i, tool in enumerate(tools):
|
||||
values["Finished"][i] = occurences.get((tool, "FINISHED"), 0)
|
||||
values["Time Out"][i] = occurences.get((tool, "TIMEOUT"), 0)
|
||||
values["Failed"][i] = occurences.get((tool, "FAILED"), 0)
|
||||
values["Other"][i] = (
|
||||
nb_apk - values["Finished"][i] - values["Time Out"][i] - values["Failed"][i]
|
||||
)
|
||||
values["Finished"] = (100 * values["Finished"]) / nb_apk
|
||||
values["Time Out"] = (100 * values["Time Out"]) / nb_apk
|
||||
values["Failed"] = (100 * values["Failed"]) / nb_apk
|
||||
values["Other"] = (100 * values["Other"]) / nb_apk
|
||||
bottom = np.zeros(len(tools) * 2)
|
||||
bottom = np.zeros(len(tools))
|
||||
|
||||
print("Finishing rate:")
|
||||
for t, p in zip(tools, values["Finished"]):
|
||||
print(f"{t}: {p:.2f}%")
|
||||
|
||||
plt.figure(figsize=(20, 9), dpi=80)
|
||||
plt.axhline(y=50, linestyle="dotted")
|
||||
plt.axhline(y=85, linestyle="dotted")
|
||||
plt.axhline(y=15, linestyle="dotted")
|
||||
for stat in ["Finished", "Time Out", "Other", "Failed"]:
|
||||
plt.bar(
|
||||
tools,
|
||||
values[stat],
|
||||
label=stat,
|
||||
color=colors[stat],
|
||||
hatch=hatch[stat],
|
||||
bottom=bottom,
|
||||
width=0.6,
|
||||
edgecolor="black",
|
||||
)
|
||||
bottom += values[stat]
|
||||
plt.xticks(tools, tools, rotation=80)
|
||||
plt.legend()
|
||||
plt.ylabel("% of analysed apk")
|
||||
render(title, interactive, image_path)
|
||||
|
||||
|
||||
def plot_status_by_tool_and_malware(
|
||||
db: Path,
|
||||
interactive: bool = True,
|
||||
image_path: Path | None = None,
|
||||
tools: list[str] | None = None,
|
||||
title: str = "Exit Status Goodware/Malware",
|
||||
):
|
||||
"""Plot the repartition of status by tools and if apk is a malware."""
|
||||
if tools is None:
|
||||
tools = get_list_tools(db)
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
tools_list_format = f"({','.join(['?' for _ in tools])})"
|
||||
nb_goodware = cur.execute(
|
||||
"SELECT COUNT(*) FROM apk WHERE vt_detection == 0;"
|
||||
).fetchone()[0]
|
||||
nb_malware = cur.execute(
|
||||
"SELECT COUNT(*) FROM apk WHERE vt_detection != 0;"
|
||||
).fetchone()[0]
|
||||
status = cur.execute(
|
||||
(
|
||||
"SELECT tool_name, tool_status, COUNT(exec.sha256), vt_detection != 0 "
|
||||
"FROM exec INNER JOIN apk ON exec.sha256 = apk.sha256 "
|
||||
f"WHERE tool_name IN {tools_list_format} "
|
||||
"GROUP BY tool_name, tool_status, vt_detection != 0;"
|
||||
),
|
||||
tools,
|
||||
).fetchall()
|
||||
occurences = {}
|
||||
for tool, stat, occurence, malware in status:
|
||||
occurences[(tool, stat, bool(malware))] = occurence
|
||||
# tools.sort(
|
||||
# key=lambda t: occurences.get((t, "FINISHED", True), 0)
|
||||
# + occurences.get((t, "FINISHED", False), 0),
|
||||
# reverse=True,
|
||||
# )
|
||||
tools.sort()
|
||||
|
||||
values = {
|
||||
"Finished": np.zeros(len(tools) * 2),
|
||||
"Time Out": np.zeros(len(tools) * 2),
|
||||
"Other": np.zeros(len(tools) * 2),
|
||||
"Failed": np.zeros(len(tools) * 2),
|
||||
}
|
||||
colors = {
|
||||
"Finished": "#009E73",
|
||||
"Time Out": "#56B4E9",
|
||||
"Other": "#555555", # TODO: find beter color
|
||||
"Failed": "#D55E00",
|
||||
}
|
||||
hatch = {
|
||||
"Finished": "/",
|
||||
"Time Out": "x",
|
||||
"Other": ".",
|
||||
"Failed": "\\",
|
||||
}
|
||||
for i, tool in enumerate(tools):
|
||||
i_goodware = 2 * i
|
||||
i_malware = 2 * i + 1
|
||||
values["Finished"][i_goodware] = occurences.get((tool, "FINISHED", False), 0)
|
||||
values["Finished"][i_malware] = occurences.get((tool, "FINISHED", True), 0)
|
||||
values["Time Out"][i_goodware] = occurences.get((tool, "TIMEOUT", False), 0)
|
||||
values["Time Out"][i_malware] = occurences.get((tool, "TIMEOUT", True), 0)
|
||||
values["Failed"][i_goodware] = occurences.get((tool, "FAILED", False), 0)
|
||||
values["Failed"][i_malware] = occurences.get((tool, "FAILED", True), 0)
|
||||
values["Other"][i_goodware] = (
|
||||
nb_goodware
|
||||
- values["Finished"][i_goodware]
|
||||
- values["Time Out"][i_goodware]
|
||||
- values["Failed"][i_goodware]
|
||||
)
|
||||
values["Other"][i_malware] = (
|
||||
nb_malware
|
||||
- values["Finished"][i_malware]
|
||||
- values["Time Out"][i_malware]
|
||||
- values["Failed"][i_malware]
|
||||
)
|
||||
values["Finished"][i_goodware] = (
|
||||
0
|
||||
if nb_goodware == 0
|
||||
else (100 * values["Finished"][i_goodware]) / nb_goodware
|
||||
)
|
||||
values["Finished"][i_malware] = (
|
||||
0 if nb_malware == 0 else (100 * values["Finished"][i_malware]) / nb_malware
|
||||
)
|
||||
values["Time Out"][i_goodware] = (
|
||||
0
|
||||
if nb_goodware == 0
|
||||
else (100 * values["Time Out"][i_goodware]) / nb_goodware
|
||||
)
|
||||
values["Time Out"][i_malware] = (
|
||||
0 if nb_malware == 0 else (100 * values["Time Out"][i_malware]) / nb_malware
|
||||
)
|
||||
values["Failed"][i_goodware] = (
|
||||
0
|
||||
if nb_goodware == 0
|
||||
else (100 * values["Failed"][i_goodware]) / nb_goodware
|
||||
)
|
||||
values["Failed"][i_malware] = (
|
||||
0 if nb_malware == 0 else (100 * values["Failed"][i_malware]) / nb_malware
|
||||
)
|
||||
values["Other"][i_goodware] = (
|
||||
0 if nb_goodware == 0 else (100 * values["Other"][i_goodware]) / nb_goodware
|
||||
)
|
||||
values["Other"][i_malware] = (
|
||||
0 if nb_malware == 0 else (100 * values["Other"][i_malware]) / nb_malware
|
||||
)
|
||||
bottom = np.zeros(len(tools) * 2)
|
||||
|
||||
x_axis = np.zeros(len(tools) * 2)
|
||||
x_width = 3
|
||||
x_0 = x_width / 2
|
||||
lstep = 1
|
||||
bstep = 5
|
||||
for i in range(len(tools)):
|
||||
x_0 += bstep + x_width
|
||||
x_axis[2 * i] = x_0
|
||||
x_0 += lstep + x_width
|
||||
x_axis[2 * i + 1] = x_0
|
||||
tick_legend = []
|
||||
for tool in tools:
|
||||
tick_legend.append(f"{tool}") # (f"{tool} on goodware")
|
||||
tick_legend.append("") # (f"{tool} on malware")
|
||||
|
||||
plt.figure(figsize=(20, 9), dpi=80)
|
||||
for stat in ["Finished", "Time Out", "Other", "Failed"]:
|
||||
plt.bar(
|
||||
x_axis,
|
||||
values[stat],
|
||||
label=stat,
|
||||
color=colors[stat],
|
||||
hatch=hatch[stat],
|
||||
bottom=bottom,
|
||||
width=x_width,
|
||||
edgecolor="black",
|
||||
)
|
||||
bottom += values[stat]
|
||||
plt.xticks(x_axis, tick_legend, rotation=80)
|
||||
plt.legend()
|
||||
plt.ylabel("% of analysed apk")
|
||||
render(title, interactive, image_path)
|
||||
|
||||
|
||||
def plot_status_by_generic_x(
|
||||
tools: list[str],
|
||||
x_col: str,
|
||||
x_label: str,
|
||||
x_in_title: str,
|
||||
args,
|
||||
group_by: Optional[str] = None,
|
||||
):
|
||||
tools.sort()
|
||||
"""group_by default to x_col, x_col must be uniq for a grouped by group_by"""
|
||||
if group_by is None:
|
||||
group_by = x_col
|
||||
with sqlite3.connect(args.data) as con:
|
||||
cur = con.cursor()
|
||||
nb_goodware_res = cur.execute(
|
||||
f"SELECT {group_by}, COUNT(*) FROM apk WHERE vt_detection == 0 GROUP BY {group_by};",
|
||||
).fetchall()
|
||||
nb_goodware = {}
|
||||
for x_group, count in nb_goodware_res:
|
||||
nb_goodware[x_group] = count
|
||||
nb_malware_res = cur.execute(
|
||||
f"SELECT {group_by}, COUNT(*) FROM apk WHERE vt_detection != 0 GROUP BY {group_by};",
|
||||
).fetchall()
|
||||
nb_malware = {}
|
||||
for x_group, count in nb_malware_res:
|
||||
nb_malware[x_group] = count
|
||||
statuses_res = cur.execute(
|
||||
(
|
||||
f"SELECT tool_name, {x_col}, {group_by}, COUNT(exec.sha256), vt_detection != 0 "
|
||||
"FROM exec INNER JOIN apk ON exec.sha256 = apk.sha256 "
|
||||
f"WHERE tool_status = 'FINISHED' "
|
||||
f"GROUP BY tool_name, tool_status, {group_by}, vt_detection != 0 "
|
||||
f"HAVING {x_col} IS NOT NULL;"
|
||||
)
|
||||
).fetchall()
|
||||
tots = {}
|
||||
for tool_, x_val, x_group, count, is_malware in statuses_res:
|
||||
if not (tool_, x_group) in tots:
|
||||
tots[(tool_, x_group)] = [x_val, 0]
|
||||
tots[(tool_, x_group)][1] += count
|
||||
plots = []
|
||||
plots_malgood = []
|
||||
metas = []
|
||||
metas_malgood = []
|
||||
for tool in tools:
|
||||
malware_plot = [
|
||||
(x_val, 100 * count / nb_malware[x_group])
|
||||
for (tool_, x_val, x_group, count, is_malware) in statuses_res
|
||||
if (tool_ == tool) and is_malware and nb_malware.get(x_group, 0) != 0
|
||||
]
|
||||
malware_meta = (f"{tool} on malware", DENSE_DOT, MARKERS[tool], COLORS[tool])
|
||||
goodware_plot = [
|
||||
(x_val, 100 * count / nb_goodware[x_group])
|
||||
for (tool_, x_val, x_group, count, is_malware) in statuses_res
|
||||
if (tool_ == tool) and not is_malware and nb_goodware.get(x_group, 0) != 0
|
||||
]
|
||||
goodware_meta = (f"{tool} on goodware", DENSE_DASH, MARKERS[tool], COLORS[tool])
|
||||
total_plot = [
|
||||
(
|
||||
x_val,
|
||||
100
|
||||
* count
|
||||
/ (nb_malware.get(x_group, 0) + nb_goodware.get(x_group, 0)),
|
||||
)
|
||||
for ((tool_, x_group), (x_val, count)) in tots.items()
|
||||
if (tool_ == tool)
|
||||
and (nb_malware.get(x_group, 0) + nb_goodware.get(x_group, 0)) != 0
|
||||
]
|
||||
total_meta = (f"{tool}", DENSE_DOT, MARKERS[tool], COLORS[tool])
|
||||
plots.append(total_plot)
|
||||
plots_malgood.append(malware_plot)
|
||||
plots_malgood.append(goodware_plot)
|
||||
metas.append(total_meta)
|
||||
metas_malgood.append(malware_meta)
|
||||
metas_malgood.append(goodware_meta)
|
||||
|
||||
plot_generic(
|
||||
[goodware_plot, malware_plot],
|
||||
[goodware_meta, malware_meta],
|
||||
x_label,
|
||||
"finishing rate",
|
||||
f"Finishing Rate by {x_in_title} for {tool} on malware and goodware",
|
||||
ylim=(-5, 105),
|
||||
interactive=args.display,
|
||||
image_path=args.figures_file,
|
||||
)
|
||||
plot_generic(
|
||||
[total_plot],
|
||||
[total_meta],
|
||||
x_label,
|
||||
"finishing rate",
|
||||
f"Finishing Rate by {x_in_title} for {tool}",
|
||||
ylim=(-5, 105),
|
||||
interactive=args.display,
|
||||
image_path=args.figures_file,
|
||||
)
|
||||
plot_generic(
|
||||
plots_malgood,
|
||||
metas_malgood,
|
||||
x_label,
|
||||
"finishing rate",
|
||||
f"Finishing Rate by {x_in_title} on malware and goodware",
|
||||
ylim=(-5, 105),
|
||||
interactive=args.display,
|
||||
image_path=args.figures_file,
|
||||
)
|
||||
plot_generic(
|
||||
plots,
|
||||
metas,
|
||||
x_label,
|
||||
"finishing rate",
|
||||
f"Finishing Rate by {x_in_title}",
|
||||
ylim=(-5, 105),
|
||||
interactive=args.display,
|
||||
image_path=args.figures_file,
|
||||
)
|
||||
|
||||
|
||||
def dbg(arg):
|
||||
# print(arg)
|
||||
return arg
|
||||
|
||||
|
||||
def plot_all_status_by_generic_x(
|
||||
tools: list[str],
|
||||
x_col: str,
|
||||
x_label: str,
|
||||
title: str,
|
||||
args,
|
||||
condition: Optional[str] = None,
|
||||
apk_condition: Optional[str] = None,
|
||||
group_by: Optional[str] = None,
|
||||
):
|
||||
if condition is None and apk_condition is None:
|
||||
condition = ""
|
||||
apk_condition = ""
|
||||
elif apk_condition is None:
|
||||
condition = f"AND ({condition})"
|
||||
apk_condition = ""
|
||||
elif condition is None:
|
||||
condition = f"AND ({apk_condition})"
|
||||
apk_condition = f"WHERE ({apk_condition})"
|
||||
else:
|
||||
condition = f"AND ({apk_condition}) AND ({condition})"
|
||||
apk_condition = f"WHERE ({apk_condition})"
|
||||
if group_by is None:
|
||||
group_by = x_col
|
||||
nb_apk = {}
|
||||
tools.sort()
|
||||
with sqlite3.connect(args.data) as con:
|
||||
cur = con.cursor()
|
||||
for x_group, count in cur.execute(
|
||||
f"SELECT {group_by}, COUNT(*) FROM apk {apk_condition} GROUP BY {group_by};",
|
||||
):
|
||||
nb_apk[x_group] = count
|
||||
statuses_res = cur.execute(
|
||||
dbg(
|
||||
f"SELECT exec.tool_name, {x_col}, {group_by}, COUNT(exec.sha256) "
|
||||
"FROM exec "
|
||||
" INNER JOIN apk ON exec.sha256 = apk.sha256 "
|
||||
" INNER JOIN tool ON exec.tool_name = tool.tool_name "
|
||||
f"WHERE tool_status = 'FINISHED' {condition} "
|
||||
f"GROUP BY exec.tool_name, tool_status, {group_by} "
|
||||
f"HAVING {x_col} IS NOT NULL;"
|
||||
)
|
||||
).fetchall()
|
||||
plots = []
|
||||
metas = []
|
||||
for tool in tools:
|
||||
plot = [
|
||||
(x_val, 100 * count / nb_apk[x_group])
|
||||
for (tool_, x_val, x_group, count) in statuses_res
|
||||
if (tool_ == tool) and nb_apk.get(x_group, 0) != 0
|
||||
]
|
||||
if len(plot) == 0:
|
||||
continue
|
||||
meta = (tool, TOOL_LINE_STYLE[tool], MARKERS[tool], COLORS[tool])
|
||||
plots.append(plot)
|
||||
metas.append(meta)
|
||||
plot_generic(
|
||||
plots,
|
||||
metas,
|
||||
x_label,
|
||||
"finishing rate",
|
||||
title,
|
||||
ylim=(-5, 105),
|
||||
interactive=args.display,
|
||||
image_path=args.figures_file,
|
||||
)
|
185
rasta_data_manipulation/rasta_triturage/utils.py
Normal file
185
rasta_data_manipulation/rasta_triturage/utils.py
Normal file
|
@ -0,0 +1,185 @@
|
|||
"""
|
||||
Utils.
|
||||
"""
|
||||
|
||||
import matplotlib.pyplot as plt # type: ignore
|
||||
import numpy as np
|
||||
from slugify import slugify # type: ignore
|
||||
from typing import Any, Callable, Optional
|
||||
from pathlib import Path
|
||||
import sqlite3
|
||||
|
||||
DENSE_DASH = (0, (5, 1))
|
||||
DENSE_DOT = (0, (1, 3))
|
||||
|
||||
MARKERS = {
|
||||
"adagio": ".",
|
||||
"amandroid": "o",
|
||||
"anadroid": "X",
|
||||
"androguard": "+",
|
||||
"androguard_dad": "v",
|
||||
"apparecium": "d",
|
||||
"blueseal": "^",
|
||||
"dialdroid": "<",
|
||||
"didfail": ">",
|
||||
"droidsafe": r"$\circ$",
|
||||
"flowdroid": r"$\boxplus$",
|
||||
"gator": r"$\otimes$",
|
||||
"ic3": "1",
|
||||
"ic3_fork": "s",
|
||||
"iccta": "P",
|
||||
"mallodroid": r"$\divideontimes$",
|
||||
"perfchecker": "*",
|
||||
"redexer": "x",
|
||||
"saaf": "D",
|
||||
"wognsen_et_al": r"$\rtimes$",
|
||||
}
|
||||
|
||||
COLORS = {
|
||||
"didfail": "#1f77b4",
|
||||
"adagio": "#ff7f0e",
|
||||
"iccta": "#2ca02c",
|
||||
"androguard": "#d62728",
|
||||
"gator": "#9467bd",
|
||||
"mallodroid": "#8c564b",
|
||||
"dialdroid": "#e377c2",
|
||||
"androguard_dad": "#7f7f7f",
|
||||
"wognsen_et_al": "#bcbd22",
|
||||
"perfchecker": "#17becf",
|
||||
"amandroid": "#1f77b4",
|
||||
"ic3": "#ff7f0e",
|
||||
"apparecium": "#2ca02c",
|
||||
"blueseal": "#d62728",
|
||||
"droidsafe": "#9467bd",
|
||||
"redexer": "#8c564b",
|
||||
"anadroid": "#e377c2",
|
||||
"saaf": "#7f7f7f",
|
||||
"ic3_fork": "#bcbd22",
|
||||
"flowdroid": "#17becf",
|
||||
"adagio": "#1f77b4",
|
||||
"androguard": "#ff7f0e",
|
||||
"mallodroid": "#2ca02c",
|
||||
"androguard_dad": "#d62728",
|
||||
"wognsen_et_al": "#9467bd",
|
||||
"amandroid": "#8c564b",
|
||||
"apparecium": "#e377c2",
|
||||
"redexer": "#7f7f7f",
|
||||
}
|
||||
|
||||
|
||||
def get_list_tools(db: Path) -> list[str]:
|
||||
"""Get the list of tool found in the database."""
|
||||
with sqlite3.connect(db) as con:
|
||||
cur = con.cursor()
|
||||
tools = cur.execute("SELECT DISTINCT tool_name FROM exec;")
|
||||
return [tool[0] for tool in tools]
|
||||
|
||||
|
||||
def radar_chart(
|
||||
axes: list[str],
|
||||
values: list[list[Any]],
|
||||
labels: list[str],
|
||||
title: str,
|
||||
interactive: bool,
|
||||
image_path: Path | None,
|
||||
):
|
||||
plt.rc("grid", linewidth=1, linestyle="-")
|
||||
plt.rc("xtick", labelsize=15)
|
||||
plt.rc("ytick", labelsize=15)
|
||||
angles = np.linspace(0, 2 * np.pi, len(axes), endpoint=False)
|
||||
angles = np.concatenate((angles, [angles[0]])) # type: ignore
|
||||
fig = plt.figure(figsize=(8, 8))
|
||||
ax = fig.add_subplot(111, polar=True)
|
||||
for label, vals in zip(labels, values):
|
||||
vals = vals + [vals[0]]
|
||||
ax.plot(angles, vals, label=label, marker=MARKERS.get(label, "."))
|
||||
ax.fill(angles, vals, alpha=0.25)
|
||||
ax.set_thetagrids(angles[:-1] * 180 / np.pi, axes)
|
||||
ax.set_ylim(bottom=0)
|
||||
ax.grid(True)
|
||||
ncol = min(5, len(labels))
|
||||
ax.legend(
|
||||
loc="lower left",
|
||||
bbox_to_anchor=(0.0, -0.2, ncol * 1.0 / 5, 0.102),
|
||||
ncol=ncol,
|
||||
mode="expand",
|
||||
borderaxespad=0.0,
|
||||
fancybox=True,
|
||||
shadow=True,
|
||||
fontsize="xx-small",
|
||||
)
|
||||
render(title, interactive, image_path)
|
||||
|
||||
|
||||
def render(
|
||||
title: str, interactive: bool, image_path: Path | None, tight_layout: bool = True
|
||||
):
|
||||
"""Render the figure. If `interactive`, display if, if `image_path`, save it."""
|
||||
# plt.title(title)
|
||||
if tight_layout:
|
||||
plt.tight_layout()
|
||||
if image_path is not None:
|
||||
if not image_path.exists():
|
||||
image_path.mkdir(parents=True, exist_ok=True)
|
||||
plt.savefig(image_path / (slugify(title) + ".pdf"), format="pdf")
|
||||
if interactive:
|
||||
plt.show()
|
||||
plt.close()
|
||||
|
||||
|
||||
def mean(field: str) -> Callable[[list[Any]], float]:
|
||||
def compute_mean(data: list[Any]) -> float:
|
||||
s = 0
|
||||
n = 0
|
||||
for e in data:
|
||||
n += 1
|
||||
s += e[field]
|
||||
return 0.0 if n == 0 else s / n
|
||||
|
||||
return compute_mean
|
||||
|
||||
|
||||
def median(field: str) -> Callable[[list[Any]], float]:
|
||||
def compute_median(data: list[Any]) -> float:
|
||||
l = [e[field] for e in data if e[field] is not None]
|
||||
l.sort()
|
||||
if not l:
|
||||
return 0.0
|
||||
return l[len(l) // 2]
|
||||
|
||||
return compute_median
|
||||
|
||||
|
||||
def plot_generic(
|
||||
data: list[list[tuple[Any, Any]]],
|
||||
meta: list[tuple[str, Any, Any, str]],
|
||||
x_label: str,
|
||||
y_label: str,
|
||||
title: str,
|
||||
ylim: Optional[tuple[int, int]] = None,
|
||||
interactive: bool = True,
|
||||
image_path: Path | None = None,
|
||||
):
|
||||
"""Plot a list of curve represented by list[(x, y)]. meta is the list of (label, linestyle)
|
||||
for each plot.
|
||||
"""
|
||||
plt.figure(figsize=(16, 9), dpi=80)
|
||||
for i, plot in enumerate(data):
|
||||
label, linestyle, marker, color = meta[i]
|
||||
plot.sort(key=lambda p: p[0])
|
||||
x_values = np.array([x for (x, _) in plot])
|
||||
y_values = np.array([y for (_, y) in plot])
|
||||
plt.plot(
|
||||
x_values[~np.isnan(y_values)],
|
||||
y_values[~np.isnan(y_values)],
|
||||
label=label,
|
||||
marker=marker,
|
||||
color=color,
|
||||
linestyle=linestyle,
|
||||
)
|
||||
if ylim is not None:
|
||||
plt.ylim(ylim)
|
||||
plt.legend(loc="upper center", ncol=4, bbox_to_anchor=(0.5, -0.1))
|
||||
plt.xlabel(x_label)
|
||||
plt.ylabel(y_label)
|
||||
render(title, interactive, image_path)
|
28
rasta_exp/.gitignore
vendored
Normal file
28
rasta_exp/.gitignore
vendored
Normal file
|
@ -0,0 +1,28 @@
|
|||
latest.csv
|
||||
venv
|
||||
results
|
||||
rasta-unit-test-generator
|
||||
apk_info
|
||||
results
|
||||
*/*.pyc
|
||||
.vscode/
|
||||
*.sif
|
||||
exp/
|
||||
__pycache__
|
||||
rasta-venv
|
||||
integration.org~
|
||||
slurm*.out
|
||||
|
||||
status_*
|
||||
set?_missing
|
||||
drebin_missing
|
||||
set?_all
|
||||
drebin_all
|
||||
*_already_finished
|
||||
set?_stats
|
||||
drebin_stats
|
||||
|
||||
apks_*.txt
|
||||
# Available on demand only
|
||||
docker/perfchecker/provided_build/perfchecker.jar
|
||||
docker/perfchecker/provided_build/soot-2.5.0.jar
|
113
rasta_exp/README.md
Normal file
113
rasta_exp/README.md
Normal file
|
@ -0,0 +1,113 @@
|
|||
# Directory structure
|
||||
* docker
|
||||
Contains one directory per tool
|
||||
Each tool directory should have a RASTA_VERSION file that contains the subdir with the tested version
|
||||
* tester
|
||||
A Python module to analyse the output of the tools, and detect errors
|
||||
* envs
|
||||
One file per tool, used to setup the ENV variables in the containers.
|
||||
This env file MUST define a numerical TIMEOUT
|
||||
|
||||
# Installation
|
||||
|
||||
- Install docker
|
||||
|
||||
''''
|
||||
apt install docker.io
|
||||
''''
|
||||
|
||||
- Install singularity
|
||||
|
||||
''''
|
||||
apt install singularity-ce
|
||||
''''
|
||||
|
||||
# Scripts
|
||||
|
||||
## grunt-worker-launcher.sh
|
||||
|
||||
A script specifically designed to launch one instance on a cluster node. Typically, it would be passed to a batch command (on a cluster that is managed with slurm). This script is probably highly dependant on the cluster setup. There is little sense in manually launching this script.
|
||||
|
||||
## grunt-worker.py
|
||||
|
||||
Contains the bulk of the logic to:
|
||||
|
||||
- Obtain tasks (from a redis server). Here a task is a couple (APK, TOOL_NAME)
|
||||
- check whether this task was already done
|
||||
- create tmp dir
|
||||
- Download the APK from AndroZoo
|
||||
- run an analysis through a docker (`--docker`) or singularity (`--singularity`) container
|
||||
- analyse the output of the analysis, and detect errors
|
||||
- delete tmp dir
|
||||
- save the results (into a couch database)
|
||||
|
||||
Also has a `--manual` mode, which is the simplest way to manually launch a task, in particular when coupled with the options to deactivate CouchDB (`--no-write-to-couch`) and Redis (`--no-mark-done`), and the option to not delete the tmp dir (`--keep-tmp-dir`).
|
||||
|
||||
## build_docker_images.sh
|
||||
|
||||
To batch create all Docker and Singularity images.
|
||||
|
||||
Parameter: the dir where the singularity files will be placed.
|
||||
|
||||
## launch-container.sh
|
||||
|
||||
- Called by grunt-worker.py.
|
||||
- Can also be called manually to debug.
|
||||
|
||||
Parameters:
|
||||
|
||||
1. Mode: Either DOCKER or SINGULARITY
|
||||
2. TOOL_NAME: for example, androguard or blueseal, etc
|
||||
3. CONTAINER_IMG: Either the name of the Docker image or the path to the sif file (without the trailing .sif)
|
||||
4. TMP_WORKDIR: a dir
|
||||
5. APK_FILENAME: the name of the APK file provided in TMP_WORKDIR (This script does NOT download apks)
|
||||
|
||||
# How to run
|
||||
|
||||
1. Choose the tool(s) you want to build the docker/singularity image by editing the file `./build_docker_images.sh on the line tools=. For example, to build didfail, change the line like below. By default, the script builds the docker/singularity image of all tools.
|
||||
|
||||
'''
|
||||
tools="didfail"
|
||||
'''
|
||||
|
||||
|
||||
2. Create Docker and Singularity images (around 16 minutes on a modern laptop)
|
||||
`./build_docker_images.sh path_you_want_the_sif_files_in` for example:
|
||||
|
||||
'''
|
||||
bash build_docker_images.sh ~/singularity
|
||||
'''
|
||||
|
||||
3. Create a venv
|
||||
|
||||
'''
|
||||
python3 -m venv rasta-venv
|
||||
source rasta-venv/bin/activate
|
||||
'''
|
||||
|
||||
4. Install necessary python package
|
||||
|
||||
'''
|
||||
python3 -m pip install -r requirements.txt
|
||||
'''
|
||||
|
||||
5. Launch one manual analysis
|
||||
|
||||
- 5.0: fill in the settings.ini file with your Androzoo api key:
|
||||
|
||||
'''
|
||||
[AndroZoo]
|
||||
apikey = your_api_key
|
||||
'''
|
||||
|
||||
- 5.1: launch the singularity container on a given hash of Android application:
|
||||
|
||||
'''
|
||||
./grunt-worker.py --base-dir /tmp/RASTA/ --no-mark-done --keep-tmp-dir --no-write-to-couch --manual --task didfail --sha APK_HASH --singularity --image-basedir SINGULARITY_IMAGE_DIRECTORY
|
||||
'''
|
||||
|
||||
For example:
|
||||
|
||||
'''
|
||||
./grunt-worker.py --base-dir /tmp/RASTA/ --no-mark-done --keep-tmp-dir --no-write-to-couch --manual --task didfail --sha 0003468487C29A71A5DA40F59E4F1F5DFF026126DD64BB58C572E30EE167C652 --singularity --image-basedir ~/singularity
|
||||
'''
|
193
rasta_exp/apk.py
Normal file
193
rasta_exp/apk.py
Normal file
|
@ -0,0 +1,193 @@
|
|||
import requests
|
||||
import logging
|
||||
import shutil
|
||||
import hashlib
|
||||
import json
|
||||
from utils import sha256_sum
|
||||
from enum import Enum
|
||||
from pathlib import Path
|
||||
from typing import Any, Optional
|
||||
from androguard.core.bytecodes import apk as androguard_apk # type: ignore
|
||||
|
||||
APK_INFO_FOLDER = Path(__file__).parent / "apk_info"
|
||||
if not APK_INFO_FOLDER.exists():
|
||||
APK_INFO_FOLDER.mkdir()
|
||||
|
||||
|
||||
class ApkRef:
|
||||
"""The reference to an apk. The apk it referes to can be in the androzoo repository or
|
||||
on the local file system.
|
||||
- If the app is in androzoon the app is refered to by its sha256
|
||||
- If the app is on the local file system, the app is refered to by its path
|
||||
"""
|
||||
|
||||
RefType = Enum("RefType", ["ANDROZOO", "LOCAL"])
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
type_: "ApkRef.RefType",
|
||||
sha256: Optional[str] = None,
|
||||
path: Optional[Path] = None,
|
||||
):
|
||||
self.type = type_
|
||||
self.sha256 = sha256
|
||||
if self.sha256 is not None:
|
||||
self.sha256 = self.sha256.strip().upper()
|
||||
self.path = path
|
||||
self.integrity_check()
|
||||
|
||||
def __str__(self):
|
||||
return f"APK<{str(self.type)}: sha256={self.sha256}, path={str(self.path)}>"
|
||||
|
||||
def integrity_check(self):
|
||||
"""Check if the ApkRef is coherent."""
|
||||
if self.type == ApkRef.RefType.ANDROZOO and self.sha256 is None:
|
||||
raise RuntimeError(f"Androzoo ApkRef must have a sha256: {str(self)}")
|
||||
if self.type == ApkRef.RefType.LOCAL and self.path is None:
|
||||
raise RuntimeError(f"Local APkRef must have a path: {str(self)}")
|
||||
|
||||
def get_path(self) -> Path:
|
||||
"""Return the path to the apk."""
|
||||
if self.path is None:
|
||||
raise RuntimeError(f"{str(self)} don't have a path")
|
||||
return self.path
|
||||
|
||||
def get_sha256(self) -> str:
|
||||
"""Return the sha256 of the apk."""
|
||||
if self.sha256 is None:
|
||||
if self.path is None:
|
||||
raise RuntimeError(f"Could not compute hash for {str(self)}")
|
||||
self.sha256 = sha256_sum(self.path).upper()
|
||||
return self.sha256
|
||||
|
||||
|
||||
def get_apk(apk_ref: ApkRef, path: Path, api_key: bytes):
|
||||
"""Retrieve and apk from its reference and put it at `path`.
|
||||
`api_key` is always ask because it's easier that way."""
|
||||
if apk_ref.type == ApkRef.RefType.ANDROZOO:
|
||||
downlaod_apk(apk_ref.get_sha256(), api_key, path)
|
||||
elif apk_ref.type == ApkRef.RefType.LOCAL:
|
||||
shutil.copy(apk_ref.get_path(), path)
|
||||
|
||||
|
||||
def downlaod_apk(apk_sha256: str, api_key: bytes, path: Path):
|
||||
"""Download an apk from androzoo and store it at the given location"""
|
||||
logging.debug(f"Start downloading apk {apk_sha256}")
|
||||
resp = requests.get(
|
||||
"https://androzoo.uni.lu/api/download",
|
||||
params={
|
||||
b"apikey": api_key,
|
||||
b"sha256": apk_sha256.encode("utf-8"),
|
||||
},
|
||||
)
|
||||
with path.open("bw") as file:
|
||||
file.write(resp.content)
|
||||
logging.debug(f"Finished downloading apk {apk_sha256}")
|
||||
|
||||
|
||||
def get_apk_info(apk_ref: ApkRef, api_key: bytes) -> dict[str, Any]:
|
||||
"""Return the information availables about an application"""
|
||||
apk_path = APK_INFO_FOLDER / (apk_ref.get_sha256() + ".json")
|
||||
get_apk(apk_ref, apk_path, api_key)
|
||||
info: dict[str, Any] = {}
|
||||
info["apk_size"] = apk_path.stat().st_size
|
||||
info["sha256"] = apk_ref.get_sha256()
|
||||
if apk_ref.path is not None:
|
||||
info["file"] = apk_ref.path.name
|
||||
else:
|
||||
info["file"] = None
|
||||
|
||||
try:
|
||||
apk = androguard_apk.APK(apk_path)
|
||||
info["name"] = apk.get_app_name() # redundant with pkg_name ?
|
||||
info["min_sdk"] = apk.get_min_sdk_version()
|
||||
if info["min_sdk"] is not None:
|
||||
info["min_sdk"] = int(info["min_sdk"])
|
||||
info["max_sdk"] = apk.get_max_sdk_version()
|
||||
if info["max_sdk"] is not None:
|
||||
info["max_sdk"] = int(info["max_sdk"])
|
||||
info["target_sdk"] = apk.get_target_sdk_version()
|
||||
if info["target_sdk"] is not None:
|
||||
info["target_sdk"] = int(info["target_sdk"])
|
||||
info["total_dex_size"] = sum(
|
||||
map(lambda x: len(x), apk.get_all_dex())
|
||||
) # TODO: faster to open the zip and use st_size?
|
||||
except:
|
||||
info["name"] = ""
|
||||
info["min_sdk"] = None
|
||||
info["max_sdk"] = None
|
||||
info["target_sdk"] = None
|
||||
info["total_dex_size"] = None
|
||||
apk_path.unlink()
|
||||
return info
|
||||
|
||||
|
||||
def load_apk_info(apks: list[ApkRef], androzoo_list: Path, api_key: bytes):
|
||||
"""Load the information for the provided apks (`apks` must contain the sha256 of the apk to load)
|
||||
from the androzoo_list. The information are then stored in json files"""
|
||||
logging.debug("Start extracting data from the androzoo list")
|
||||
apks_dict = {a.get_sha256().strip().upper(): a for a in apks}
|
||||
for apk in apks:
|
||||
apk_info_path = APK_INFO_FOLDER / (apk.get_sha256() + ".json")
|
||||
if apk_info_path.exists():
|
||||
del apks_dict[apk.get_sha256()]
|
||||
|
||||
with androzoo_list.open("r") as list_file:
|
||||
first_line = list_file.readline()
|
||||
entrie_names = list(map(lambda x: x.strip(), first_line.split(",")))
|
||||
sha256_index = entrie_names.index(
|
||||
"sha256"
|
||||
) # TODO: if 'sha256' is not found in the first line, we have the wrong file...
|
||||
while line := list_file.readline():
|
||||
if not apks_dict:
|
||||
break
|
||||
entries = list(map(lambda x: x.strip(), line.split(",")))
|
||||
# TODO: don't parse the entries manually...
|
||||
if len(entries) != len(entrie_names):
|
||||
entries_set = set(map(lambda x: x.upper(), entries))
|
||||
inter = entries_set.intersection(apks_dict.keys())
|
||||
if inter:
|
||||
logging.warning(
|
||||
f"The information for the apk {inter} may not be retreived from the list due to malformated line: {line}"
|
||||
)
|
||||
continue
|
||||
info: dict[str, Any] = {}
|
||||
sha256 = entries[sha256_index].upper()
|
||||
if sha256 in apks_dict:
|
||||
for (k, v) in zip(entrie_names, entries):
|
||||
info[k] = v
|
||||
if "markets" in info:
|
||||
info["markets"] = list(
|
||||
map(lambda x: x.strip(), info["markets"].split("|"))
|
||||
)
|
||||
if "apk_size" in info:
|
||||
info["apk_size"] = int(info["apk_size"])
|
||||
if "vt_detection" in info:
|
||||
info["vt_detection"] = int(info["vt_detection"])
|
||||
if "dex_size" in info:
|
||||
info["dex_size"] = int(info["dex_size"])
|
||||
if "pkg_name" in info:
|
||||
info["pkg_name"] = (
|
||||
info["pkg_name"].removeprefix('"').removesuffix('"')
|
||||
)
|
||||
apk_info_path = APK_INFO_FOLDER / (sha256 + ".json")
|
||||
info |= get_apk_info(apks_dict[sha256], api_key)
|
||||
with apk_info_path.open("w") as file:
|
||||
json.dump(info, file)
|
||||
del apks_dict[sha256]
|
||||
for apk_hash in apks_dict:
|
||||
logging.warning(
|
||||
f"The information for the apk {apk_hash} was not found in the androzoo list"
|
||||
)
|
||||
info = get_apk_info(apks_dict[apk_hash], api_key)
|
||||
for key in entrie_names:
|
||||
if key not in info:
|
||||
info[key] = None
|
||||
apk_info_path = APK_INFO_FOLDER / (apk_hash + ".json")
|
||||
info |= get_apk_info(apks_dict[apk_hash], api_key)
|
||||
with apk_info_path.open("w") as file:
|
||||
json.dump(info, file)
|
||||
|
||||
logging.debug(
|
||||
"Finished extracting the information about the apks from androzoo list"
|
||||
)
|
37
rasta_exp/build_docker_images.sh
Executable file
37
rasta_exp/build_docker_images.sh
Executable file
|
@ -0,0 +1,37 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
|
||||
SIF_DIR=$1
|
||||
if [[ -z "${SIF_DIR}" ]]; then
|
||||
echo MISSING SIF_DIR parameter
|
||||
exit 1
|
||||
fi
|
||||
|
||||
[[ -d "${SIF_DIR}" ]] || mkdir "${SIF_DIR}"
|
||||
|
||||
function docker_to_sif {
|
||||
img_name=$1
|
||||
[[ -f ${SIF_DIR}/$1.sif ]] && rm ${SIF_DIR}/$1.sif
|
||||
singularity pull ${SIF_DIR}/$1.sif docker-daemon:$1:latest
|
||||
}
|
||||
|
||||
|
||||
function build_docker_img {
|
||||
pushd .
|
||||
tool_name=$1
|
||||
cd docker/${tool_name}
|
||||
version=$(cat RASTA_VERSION)
|
||||
cd ${version}
|
||||
docker build --ulimit nofile=65536:65536 -f Dockerfile -t rasta-${tool_name} .
|
||||
docker save rasta-${tool_name}:latest | gzip > ../../../${SIF_DIR}/rasta-${tool_name}.tar.gz
|
||||
popd
|
||||
}
|
||||
|
||||
# Final list:
|
||||
#tools="androguard androguard_dad didfail adagio anadroid blueseal didfail flowdroid mallodroid redexer saaf wognsen_et_al iccta ic3 ic3_fork gator droidsafe apparecium amandroid dialdroid perfchecker"
|
||||
tools="androguard androguard_dad didfail adagio anadroid blueseal didfail flowdroid mallodroid redexer saaf wognsen_et_al iccta ic3 ic3_fork gator droidsafe apparecium amandroid dialdroid"
|
||||
|
||||
for tool in ${tools}; do
|
||||
build_docker_img ${tool}
|
||||
docker_to_sif rasta-${tool}
|
||||
done;
|
19
rasta_exp/cluster_worker/grunt-worker-launcher-2.sh
Executable file
19
rasta_exp/cluster_worker/grunt-worker-launcher-2.sh
Executable file
|
@ -0,0 +1,19 @@
|
|||
#!/bin/bash -l
|
||||
|
||||
|
||||
module load tools/Singularity
|
||||
module load lang/Python/3.8.6-GCCcore-10.2.0
|
||||
|
||||
source ../venvrasta/bin/activate
|
||||
|
||||
seq 2 | parallel --jobs 2 ./grunt.sh
|
||||
|
||||
#while /bin/true
|
||||
#do
|
||||
# python3 grunt-worker.py --no-mark-done --overwrite --singularity --image-basedir ~/sif
|
||||
# if [[ z"$?" == z111 ]]
|
||||
# then
|
||||
# break
|
||||
# fi
|
||||
# sleep 10
|
||||
#done
|
19
rasta_exp/cluster_worker/grunt-worker-launcher-4.sh
Executable file
19
rasta_exp/cluster_worker/grunt-worker-launcher-4.sh
Executable file
|
@ -0,0 +1,19 @@
|
|||
#!/bin/bash -l
|
||||
|
||||
|
||||
module load tools/Singularity
|
||||
module load lang/Python/3.8.6-GCCcore-10.2.0
|
||||
|
||||
source ../venvrasta/bin/activate
|
||||
|
||||
seq 4 | parallel --jobs 4 ./grunt.sh
|
||||
|
||||
#while /bin/true
|
||||
#do
|
||||
# python3 grunt-worker.py --no-mark-done --overwrite --singularity --image-basedir ~/sif
|
||||
# if [[ z"$?" == z111 ]]
|
||||
# then
|
||||
# break
|
||||
# fi
|
||||
# sleep 10
|
||||
#done
|
19
rasta_exp/cluster_worker/grunt-worker-launcher-6.sh
Executable file
19
rasta_exp/cluster_worker/grunt-worker-launcher-6.sh
Executable file
|
@ -0,0 +1,19 @@
|
|||
#!/bin/bash -l
|
||||
|
||||
|
||||
module load tools/Singularity
|
||||
module load lang/Python/3.8.6-GCCcore-10.2.0
|
||||
|
||||
source ../venvrasta/bin/activate
|
||||
|
||||
seq 6 | parallel --jobs 6 ./grunt.sh
|
||||
|
||||
#while /bin/true
|
||||
#do
|
||||
# python3 grunt-worker.py --no-mark-done --overwrite --singularity --image-basedir ~/sif
|
||||
# if [[ z"$?" == z111 ]]
|
||||
# then
|
||||
# break
|
||||
# fi
|
||||
# sleep 10
|
||||
#done
|
19
rasta_exp/cluster_worker/grunt-worker-launcher.sh
Executable file
19
rasta_exp/cluster_worker/grunt-worker-launcher.sh
Executable file
|
@ -0,0 +1,19 @@
|
|||
#!/bin/bash -l
|
||||
|
||||
|
||||
module load tools/Singularity
|
||||
module load lang/Python/3.8.6-GCCcore-10.2.0
|
||||
|
||||
source ../venvrasta/bin/activate
|
||||
|
||||
./grunt.sh
|
||||
|
||||
#while /bin/true
|
||||
#do
|
||||
# python3 grunt-worker.py --no-mark-done --overwrite --singularity --image-basedir ~/sif
|
||||
# if [[ z"$?" == z111 ]]
|
||||
# then
|
||||
# break
|
||||
# fi
|
||||
# sleep 10
|
||||
#done
|
12
rasta_exp/cluster_worker/grunt.sh
Executable file
12
rasta_exp/cluster_worker/grunt.sh
Executable file
|
@ -0,0 +1,12 @@
|
|||
#!/bin/bash
|
||||
|
||||
|
||||
while /bin/true
|
||||
do
|
||||
python3 grunt-worker.py --no-mark-done --overwrite --singularity --image-basedir ~/sif
|
||||
if [[ z"$?" == z111 ]]
|
||||
then
|
||||
break
|
||||
fi
|
||||
sleep 10
|
||||
done
|
76
rasta_exp/cluster_worker/stats.py
Normal file
76
rasta_exp/cluster_worker/stats.py
Normal file
|
@ -0,0 +1,76 @@
|
|||
import os
|
||||
import json
|
||||
import pandas as pd
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
from tabulate import tabulate
|
||||
import sys
|
||||
|
||||
|
||||
def extract(data, key_list):
|
||||
d = {}
|
||||
for e in key_list:
|
||||
d[e] = data[e]
|
||||
return d
|
||||
|
||||
|
||||
tabulate_tools = [["Tool", "Total", "Timeout", "Crash", "Time (s)", "Memory (MB)"]]
|
||||
|
||||
if len(sys.argv) < 2:
|
||||
print("python3 stats.py directory")
|
||||
quit()
|
||||
|
||||
print("Going into " + sys.argv[1])
|
||||
os.chdir(sys.argv[1])
|
||||
|
||||
for dir in os.listdir():
|
||||
if os.path.isdir(dir):
|
||||
print("Processing " + str(dir))
|
||||
df = pd.DataFrame()
|
||||
# df.astype({"apk_size": int, "crashed": bool})
|
||||
|
||||
for file in os.listdir(dir):
|
||||
with open(dir + "/" + file, "r") as f:
|
||||
data = json.load(f)
|
||||
|
||||
d = {}
|
||||
d = extract(
|
||||
data, ["crashed", "timeout", "user-cpu-time", "max-rss-mem"]
|
||||
)
|
||||
d.update(
|
||||
extract(
|
||||
data["apk"], ["apk_size", "min_sdk", "target_sdk", "max_sdk"]
|
||||
)
|
||||
)
|
||||
df_apk = pd.DataFrame(d, index=[data["apk"]["sha256"]])
|
||||
if not df.empty:
|
||||
df = pd.concat([df, df_apk])
|
||||
else:
|
||||
df = df_apk
|
||||
|
||||
# print(df)
|
||||
# print("Total: " + str(len(df)))
|
||||
# print("Crash: " + str(df["crashed"].sum()))
|
||||
# print("Average size: " + str(df["apk_size"].mean() / 1000 ** 2) + " Mo ")
|
||||
# print("Average size crashed: " + str(df[df["crashed"] == True]["apk_size"].mean() / 1000 ** 2) + " Mo ")
|
||||
# print("Average size not crashed: " + str(df[df["crashed"] == False]["apk_size"].mean() / 1000 ** 2) + " Mo ")
|
||||
|
||||
# df.target_sdk.fillna(value='0', inplace=True) # Replace None values by 0
|
||||
# #df.replace(to_replace=[None], value=np.nan, inplace=True)
|
||||
# df['target_sdk']=df['target_sdk'].astype(int)
|
||||
# df.sort_values("target_sdk") # Sort on a column
|
||||
#
|
||||
# ax = plt.gca()
|
||||
# df.plot(kind='scatter',x='target_sdk',y='apk_size', ax=ax)
|
||||
# plt.show()
|
||||
|
||||
df["user-cpu-time"] = df["user-cpu-time"].astype(float) # HACK
|
||||
df["max-rss-mem"] = df["max-rss-mem"].astype(float) # HACK
|
||||
cpu = round(df["user-cpu-time"].mean(), 1)
|
||||
memory = int(df["max-rss-mem"].mean() / (1000**2))
|
||||
tabulate_tools.append(
|
||||
[dir, len(df), df["timeout"].sum(), df["crashed"].sum(), cpu, memory]
|
||||
)
|
||||
|
||||
print(tabulate(tabulate_tools))
|
||||
print(tabulate(tabulate_tools, tablefmt="latex"))
|
5479
rasta_exp/dataset/drebin
Normal file
5479
rasta_exp/dataset/drebin
Normal file
File diff suppressed because it is too large
Load diff
6519
rasta_exp/dataset/set0
Normal file
6519
rasta_exp/dataset/set0
Normal file
File diff suppressed because it is too large
Load diff
6396
rasta_exp/dataset/set1
Normal file
6396
rasta_exp/dataset/set1
Normal file
File diff suppressed because it is too large
Load diff
6349
rasta_exp/dataset/set2
Normal file
6349
rasta_exp/dataset/set2
Normal file
File diff suppressed because it is too large
Load diff
6278
rasta_exp/dataset/set3
Normal file
6278
rasta_exp/dataset/set3
Normal file
File diff suppressed because it is too large
Load diff
6241
rasta_exp/dataset/set4
Normal file
6241
rasta_exp/dataset/set4
Normal file
File diff suppressed because it is too large
Load diff
6197
rasta_exp/dataset/set5
Normal file
6197
rasta_exp/dataset/set5
Normal file
File diff suppressed because it is too large
Load diff
6194
rasta_exp/dataset/set6
Normal file
6194
rasta_exp/dataset/set6
Normal file
File diff suppressed because it is too large
Load diff
6154
rasta_exp/dataset/set7
Normal file
6154
rasta_exp/dataset/set7
Normal file
File diff suppressed because it is too large
Load diff
6137
rasta_exp/dataset/set8
Normal file
6137
rasta_exp/dataset/set8
Normal file
File diff suppressed because it is too large
Load diff
6060
rasta_exp/dataset/set9
Normal file
6060
rasta_exp/dataset/set9
Normal file
File diff suppressed because it is too large
Load diff
9
rasta_exp/docker/a3e/README.md
Normal file
9
rasta_exp/docker/a3e/README.md
Normal file
|
@ -0,0 +1,9 @@
|
|||
# A3E
|
||||
|
||||
- [source](https://github.com/tanzirul/a3e)
|
||||
- [fork](https://github.com/imdea-software/a3e)
|
||||
- [paper](https://dl.acm.org/doi/abs/10.1145/2509136.2509549)
|
||||
- language: Ruby 2
|
||||
- dependencies: java
|
||||
- number of years without at least 1 commit since first commit: 7
|
||||
- License: BSD-3-clause
|
1
rasta_exp/docker/adagio/RASTA_VERSION
Normal file
1
rasta_exp/docker/adagio/RASTA_VERSION
Normal file
|
@ -0,0 +1 @@
|
|||
adagio
|
7
rasta_exp/docker/adagio/README.md
Normal file
7
rasta_exp/docker/adagio/README.md
Normal file
|
@ -0,0 +1,7 @@
|
|||
# Adagio
|
||||
|
||||
- [source](https://github.com/hgascon/adagio)
|
||||
- [paper](https://dl.acm.org/doi/10.1145/2517312.2517315)
|
||||
- language: Python 3.8 (could not find an exacte version that works, python3.8 is just the one that required the less tweaking)
|
||||
- number of years without at least 1 commit since first commit: 4 (2020, 2018, 2017, 2023)
|
||||
- License: GPL2
|
19
rasta_exp/docker/adagio/adagio/Dockerfile
Normal file
19
rasta_exp/docker/adagio/adagio/Dockerfile
Normal file
|
@ -0,0 +1,19 @@
|
|||
FROM ubuntu:20.04
|
||||
|
||||
# RUN sed -i -e "s/archive.ubuntu.com/old-releases.ubuntu.com/g" /etc/apt/sources.list
|
||||
|
||||
RUN apt-get update && apt-get install -y git time
|
||||
|
||||
RUN mkdir /workspace
|
||||
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
RUN apt-get update && apt-get install -y python3.8 python3-pip python3-scipy python3-matplotlib python3-sklearn-lib
|
||||
|
||||
RUN git clone https://github.com/hgascon/adagio.git /workspace/adagio &&\
|
||||
cd /workspace/adagio && git checkout 8a2c1445df638d9c2fd2b1008a079cb092a63f0b &&\
|
||||
sed -i 's/matplotlib==3.1.1/#matplotlib==3.1.1/' /workspace/adagio/requirements.txt &&\
|
||||
sed -i 's/scikit-learn==0.21.2/#scikit-learn==0.21.2/' /workspace/adagio/requirements.txt &&\
|
||||
sed -i 's/scipy==1.3.0/#scipy==1.3.0/' /workspace/adagio/requirements.txt &&\
|
||||
pip3 install -r /workspace/adagio/requirements.txt
|
||||
|
||||
COPY run.sh /
|
24
rasta_exp/docker/adagio/adagio/run.sh
Executable file
24
rasta_exp/docker/adagio/adagio/run.sh
Executable file
|
@ -0,0 +1,24 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
WORKDIR="/workspace/adagio"
|
||||
|
||||
cd ${WORKDIR}
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} python3 adagio.py -d /mnt/ -o /mnt -f > /mnt/stdout 2> /mnt/stderr
|
98
rasta_exp/docker/adagio/test.py
Normal file
98
rasta_exp/docker/adagio/test.py
Normal file
|
@ -0,0 +1,98 @@
|
|||
import datetime
|
||||
import importlib.util
|
||||
import logging
|
||||
import hashlib
|
||||
|
||||
from typing import Any, Type
|
||||
from pathlib import Path
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
sys.path.append(str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
import orchestrator
|
||||
|
||||
errors = orchestrator.error_collector
|
||||
utils = orchestrator.utils
|
||||
|
||||
TIMEOUT = 900
|
||||
|
||||
GUEST_MNT = "/mnt"
|
||||
PATH_APK = f"{GUEST_MNT}/app.apk"
|
||||
|
||||
WORKDIR = "/workspace/adagio"
|
||||
CMD = f"python3 adagio.py -d {GUEST_MNT} -o {GUEST_MNT} -f"
|
||||
|
||||
TOOL_NAME = "adagio"
|
||||
|
||||
# Version name -> folder name
|
||||
TOOL_VERSIONS = {
|
||||
"adagio": "adagio",
|
||||
# "latest": "latest_2022", # the current master is not stable
|
||||
}
|
||||
# Name of the default version (default folder = TOOL_VERSIONS[DEFAULT_TOOL_VERSION])
|
||||
DEFAULT_TOOL_VERSION = "adagio"
|
||||
|
||||
EXPECTED_ERROR_TYPES: list[Type[errors.LoggedError]] = [
|
||||
errors.PythonError
|
||||
] # Because androguard, but adagio doest really crash
|
||||
|
||||
|
||||
def analyse_artifacts(path: Path) -> dict[str, Any]:
|
||||
"""Analyse the artifacts of a test located at `path`."""
|
||||
report = utils.parse_report(path / "report")
|
||||
report["errors"] = list(
|
||||
map(
|
||||
lambda e: e.get_dict(),
|
||||
errors.get_errors(path / "stderr", EXPECTED_ERROR_TYPES),
|
||||
)
|
||||
)
|
||||
if report["timeout"]:
|
||||
report["tool-status"] = "TIMEOUT"
|
||||
elif check_success(path):
|
||||
report["tool-status"] = "FINISHED"
|
||||
else:
|
||||
report["tool-status"] = "FAILED"
|
||||
report["tool-name"] = TOOL_NAME
|
||||
report["date"] = str(datetime.datetime.now())
|
||||
report["apk"] = utils.sha256_sum(path / "app.apk").upper()
|
||||
return report
|
||||
|
||||
|
||||
def check_success(path: Path) -> bool:
|
||||
"""Check if the analysis finished without crashing."""
|
||||
apks = list(path.glob("*.apk"))
|
||||
if len(apks) != 1:
|
||||
raise RuntimeError(
|
||||
f"Expected to found exactly 1 apk in the root of {TOOL_VERSIONS} artifact folder, found {apks}"
|
||||
)
|
||||
apk = apks[0]
|
||||
path_result = path / utils.sha256_sum(apk).lower()
|
||||
return path_result.exists()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import docker # type: ignore
|
||||
|
||||
args = orchestrator.get_test_args(TOOL_NAME)
|
||||
|
||||
tool_folder = Path(__file__).resolve().parent
|
||||
api_key = orchestrator.get_androzoo_key()
|
||||
if args.get_apk_info:
|
||||
orchestrator.load_apk_info(args.apk_refs, args.androzoo_list, api_key)
|
||||
client = docker.from_env()
|
||||
|
||||
logging.info("Command tested: ")
|
||||
logging.info(f"[{WORKDIR}]$ {CMD}")
|
||||
|
||||
for apk_ref in args.apk_refs:
|
||||
orchestrator.test_tool_on_apk(
|
||||
client,
|
||||
tool_folder,
|
||||
api_key,
|
||||
apk_ref,
|
||||
args.tool_version,
|
||||
args.keep_artifacts,
|
||||
args.force_test,
|
||||
)
|
1
rasta_exp/docker/amandroid/RASTA_VERSION
Normal file
1
rasta_exp/docker/amandroid/RASTA_VERSION
Normal file
|
@ -0,0 +1 @@
|
|||
home_build
|
10
rasta_exp/docker/amandroid/README.md
Normal file
10
rasta_exp/docker/amandroid/README.md
Normal file
|
@ -0,0 +1,10 @@
|
|||
# Amandroid
|
||||
|
||||
- [source](https://github.com/arguslab/Argus-SAF)
|
||||
- [fork](https://github.com/ForceOfp/Argus-SAF)
|
||||
- [paper](https://dl.acm.org/doi/10.1145/3183575)
|
||||
- language: Scala/Java (Python2 for native droid, but != amandroid)
|
||||
- JVM: Java 10 ? Hard to find and looks like java 8 works
|
||||
- Build: sbt
|
||||
- number of years without at least 1 commit since first commit: 3
|
||||
- License: Apache 2.0
|
33
rasta_exp/docker/amandroid/home_build/Dockerfile
Normal file
33
rasta_exp/docker/amandroid/home_build/Dockerfile
Normal file
|
@ -0,0 +1,33 @@
|
|||
FROM ubuntu:22.04
|
||||
|
||||
# RUN sed -i -e "s/archive.ubuntu.com/old-releases.ubuntu.com/g" /etc/apt/sources.list
|
||||
|
||||
RUN apt-get update && apt-get install -y git time unzip wget
|
||||
|
||||
RUN mkdir /workspace
|
||||
RUN git init /workspace/amandroid && \
|
||||
cd /workspace/amandroid && \
|
||||
git remote add origin https://github.com/arguslab/Argus-SAF.git && \
|
||||
git fetch --depth=1 origin v3.2.0 && \
|
||||
git reset --hard FETCH_HEAD
|
||||
|
||||
# Avoid downloading this each time we launch a new docker
|
||||
RUN mkdir -p /workspace/.amandroid_stash && \
|
||||
cd /workspace/.amandroid_stash && \
|
||||
wget https://www.fengguow.dev/resources/amandroid.zip && \
|
||||
wget https://www.fengguow.dev/resources/amandroid.checksum && \
|
||||
unzip amandroid.zip
|
||||
|
||||
RUN apt-get update && apt-get install -y openjdk-8-jdk
|
||||
|
||||
RUN cd /workspace/amandroid && \
|
||||
sed -i 's/val remotec = getRemoteChecksum("amandroid.checksum")/\/\/val remotec = getRemoteChecksum("amandroid.checksum")/' /workspace/amandroid/amandroid/src/main/scala/org/argus/amandroid/core/AndroidGlobalConfig.scala && \
|
||||
sed -i '46i\\ val remotec = localc' /workspace/amandroid/amandroid/src/main/scala/org/argus/amandroid/core/AndroidGlobalConfig.scala && \
|
||||
sed -i '164i javacOptions in jawa ++= Seq("-encoding", "UTF-8")' /workspace/amandroid/build.sbt
|
||||
|
||||
RUN cd /workspace/amandroid && \
|
||||
./tools/bin/sbt -Duser.home=/workspace clean compile assembly test
|
||||
|
||||
RUN ln -s /workspace/amandroid/target/scala-2.12/argus-saf-3.2.0-assembly.jar /workspace/amandroid/argus-saf.jar
|
||||
|
||||
COPY run.sh /
|
20
rasta_exp/docker/amandroid/home_build/run.sh
Executable file
20
rasta_exp/docker/amandroid/home_build/run.sh
Executable file
|
@ -0,0 +1,20 @@
|
|||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} java ${JAVA_PARAM} -Duser.home=/workspace -jar /workspace/amandroid/argus-saf.jar taint -a COMPONENT_BASED -o /mnt/out /mnt/${APK_filename} > /mnt/stdout 2> /mnt/stderr
|
23
rasta_exp/docker/amandroid/provided_build/Dockerfile
Normal file
23
rasta_exp/docker/amandroid/provided_build/Dockerfile
Normal file
|
@ -0,0 +1,23 @@
|
|||
FROM ubuntu:18.04
|
||||
|
||||
# RUN sed -i -e "s/archive.ubuntu.com/old-releases.ubuntu.com/g" /etc/apt/sources.list
|
||||
|
||||
RUN apt-get update && apt-get install -y git time unzip wget
|
||||
|
||||
RUN mkdir /workspace
|
||||
RUN git init /workspace/amandroid && \
|
||||
cd /workspace/amandroid && \
|
||||
git remote add origin https://github.com/arguslab/Argus-SAF.git && \
|
||||
git fetch --depth=1 origin 06596c6bb03fe2560030b52bf2b47d17d1bd3068 && \
|
||||
git reset --hard FETCH_HEAD
|
||||
|
||||
# Avoid downloading this each time we launch a new docker
|
||||
RUN mkdir -p /root/.amandroid_stash && \
|
||||
cd /root/.amandroid_stash && \
|
||||
wget https://www.fengguow.dev/resources/amandroid.zip && \
|
||||
wget https://www.fengguow.dev/resources/amandroid.checksum && \
|
||||
unzip amandroid.zip
|
||||
|
||||
RUN apt-get update && apt-get install -y openjdk-8-jdk
|
||||
|
||||
RUN ln -s /workspace/amandroid/binaries/argus-saf-3.2.1-SNAPSHOT-assembly.jar /workspace/amandroid/argus-saf.jar
|
98
rasta_exp/docker/amandroid/test.py
Normal file
98
rasta_exp/docker/amandroid/test.py
Normal file
|
@ -0,0 +1,98 @@
|
|||
import datetime
|
||||
import importlib.util
|
||||
import logging
|
||||
|
||||
from typing import Any, Type
|
||||
from pathlib import Path
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
sys.path.append(str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
import orchestrator
|
||||
|
||||
errors = orchestrator.error_collector
|
||||
utils = orchestrator.utils
|
||||
|
||||
TIMEOUT = 900
|
||||
|
||||
GUEST_MNT = "/mnt"
|
||||
PATH_APK = f"{GUEST_MNT}/app.apk"
|
||||
|
||||
WORKDIR = "/"
|
||||
CMD = f"java -jar /workspace/amandroid/argus-saf.jar taint -a COMPONENT_BASED -o /mnt/out {PATH_APK}"
|
||||
|
||||
TOOL_NAME = "amandroid"
|
||||
|
||||
# Version name -> folder name
|
||||
TOOL_VERSIONS = {
|
||||
"home_build": "home_build",
|
||||
"provided_build": "provided_build",
|
||||
}
|
||||
# Name of the default version (default folder = TOOL_VERSIONS[DEFAULT_TOOL_VERSION])
|
||||
DEFAULT_TOOL_VERSION = "home_build"
|
||||
|
||||
# Not much parsable error with this tool, so I just try to catch java errors just in case
|
||||
EXPECTED_ERROR_TYPES: list[Type[errors.LoggedError]] = [
|
||||
errors.JavaError,
|
||||
errors.NoPrefixJavaError,
|
||||
]
|
||||
|
||||
|
||||
def analyse_artifacts(path: Path) -> dict[str, Any]:
|
||||
"""Analyse the artifacts of a test located at `path`."""
|
||||
report = utils.parse_report(path / "report")
|
||||
report["errors"] = list(
|
||||
map(
|
||||
lambda e: e.get_dict(),
|
||||
errors.get_errors(path / "stderr", EXPECTED_ERROR_TYPES),
|
||||
)
|
||||
)
|
||||
report["errors"].extend(
|
||||
map(
|
||||
lambda e: e.get_dict(),
|
||||
errors.get_errors(path / "stdout", EXPECTED_ERROR_TYPES),
|
||||
)
|
||||
)
|
||||
if report["timeout"]:
|
||||
report["tool-status"] = "TIMEOUT"
|
||||
elif check_success(path):
|
||||
report["tool-status"] = "FINISHED"
|
||||
else:
|
||||
report["tool-status"] = "FAILED"
|
||||
report["tool-name"] = TOOL_NAME
|
||||
report["date"] = str(datetime.datetime.now())
|
||||
report["apk"] = utils.sha256_sum(path / "app.apk").upper()
|
||||
return report
|
||||
|
||||
|
||||
def check_success(path: Path) -> bool:
|
||||
"""Check if the analysis finished without crashing."""
|
||||
return (path / "out" / "app" / "result" / "AppData.txt").exists()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import docker # type: ignore
|
||||
|
||||
args = orchestrator.get_test_args(TOOL_NAME)
|
||||
|
||||
tool_folder = Path(__file__).resolve().parent
|
||||
api_key = orchestrator.get_androzoo_key()
|
||||
if args.get_apk_info:
|
||||
orchestrator.load_apk_info(args.apk_refs, args.androzoo_list, api_key)
|
||||
client = docker.from_env()
|
||||
|
||||
logging.info("Command tested: ")
|
||||
logging.info(f"[{WORKDIR}]$ {CMD}")
|
||||
|
||||
for apk_ref in args.apk_refs:
|
||||
orchestrator.test_tool_on_apk(
|
||||
client,
|
||||
tool_folder,
|
||||
api_key,
|
||||
apk_ref,
|
||||
args.tool_version,
|
||||
args.keep_artifacts,
|
||||
args.force_test,
|
||||
)
|
1
rasta_exp/docker/anadroid/RASTA_VERSION
Normal file
1
rasta_exp/docker/anadroid/RASTA_VERSION
Normal file
|
@ -0,0 +1 @@
|
|||
home_build
|
10
rasta_exp/docker/anadroid/README.md
Normal file
10
rasta_exp/docker/anadroid/README.md
Normal file
|
@ -0,0 +1,10 @@
|
|||
# Anadroid
|
||||
|
||||
- [source](https://github.com/maggieddie/pushdownoo)
|
||||
- [or maybe?](https://github.com/maggieddie/anadroid), the paper link to the other repo
|
||||
- [paper](https://dl.acm.org/doi/10.1145/2516760.2516769)
|
||||
- language: Scala 2.9.1 / Java 6 / Python2
|
||||
- JVM: OpenJDK Runtime Environment (IcedTea6 1.11.5) ubuntu 12.04.1
|
||||
- Build: Ant
|
||||
- number of years without at least 1 commit since first commit: 9
|
||||
- license: CRAPL
|
30
rasta_exp/docker/anadroid/home_build/Dockerfile
Normal file
30
rasta_exp/docker/anadroid/home_build/Dockerfile
Normal file
|
@ -0,0 +1,30 @@
|
|||
FROM ubuntu:12.04
|
||||
|
||||
RUN sed -i -e "s/archive.ubuntu.com/old-releases.ubuntu.com/g" /etc/apt/sources.list
|
||||
|
||||
RUN apt-get update && apt-get install -y git python2.7 time
|
||||
|
||||
RUN mkdir /workspace
|
||||
RUN git clone https://github.com/maggieddie/pushdownoo.git /workspace/pushdownoo
|
||||
|
||||
RUN cd /workspace/pushdownoo && git checkout c06e03f6501d1441389d17271e44b9f345f637ff
|
||||
|
||||
RUN apt-get update && apt-get install -y ant openjdk-6-jdk make graphviz && \
|
||||
ln -s /usr/bin/dot /usr/local/bin/dot
|
||||
|
||||
RUN cd /workspace/pushdownoo/jdex2sex && make clean && make
|
||||
RUN mkdir /workspace/pushdownoo/pdafordalvik/test && \
|
||||
cd /workspace/pushdownoo/pdafordalvik && \
|
||||
sed -i '266d' /workspace/pushdownoo/pdafordalvik/build.xml && \
|
||||
sed -i '262,264d' /workspace/pushdownoo/pdafordalvik/build.xml && \
|
||||
sed -i '163,164d' /workspace/pushdownoo/pdafordalvik/android-knowledge/sinks.txt && \
|
||||
sed -i '158d' /workspace/pushdownoo/pdafordalvik/android-knowledge/sinks.txt && \
|
||||
sed -i '80,83d' /workspace/pushdownoo/pdafordalvik/android-knowledge/classes.txt && \
|
||||
sed -i '410,412d' /workspace/pushdownoo/pdafordalvik/android-knowledge/callbacks.txt && \
|
||||
sed -i '407,408d' /workspace/pushdownoo/pdafordalvik/android-knowledge/callbacks.txt && \
|
||||
sed -i '263i\\ println("ee3d6c7015b83b3dc84b21a2e79506175f07c00ecf03e7b3b8edea4e445618bd: END OF ANALYSIS.")' /workspace/pushdownoo/pdafordalvik/src/org/ucombinator/playhelpers/PlayHelper.scala && \
|
||||
sed -i '116i\\ println("ee3d6c7015b83b3dc84b21a2e79506175f07c00ecf03e7b3b8edea4e445618bd: START OF ANALYSIS.")' /workspace/pushdownoo/pdafordalvik/src/org/ucombinator/playhelpers/PlayHelper.scala && \
|
||||
sed -i 's#^exec java# exec java -Duser.home=/tmp/user/#' ../apktool/apktool && \
|
||||
export ANT_OPTS="-Xmx2048M -Xms2048M -Xss512M -XX:MaxPermSize=512m" && \
|
||||
make
|
||||
COPY run.sh /
|
22
rasta_exp/docker/anadroid/home_build/run.sh
Executable file
22
rasta_exp/docker/anadroid/home_build/run.sh
Executable file
|
@ -0,0 +1,22 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
cd ${WORKDIR}
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} java ${JAVA_PARAM} -jar ${JAR_FILE} org.ucombinator.dalvik.cfa.cesk.RunAnalysis --k 1 --gc --lra --aco --godel /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
23
rasta_exp/docker/anadroid/provided_build/Dockerfile
Normal file
23
rasta_exp/docker/anadroid/provided_build/Dockerfile
Normal file
|
@ -0,0 +1,23 @@
|
|||
FROM ubuntu:12.04
|
||||
|
||||
RUN sed -i -e "s/archive.ubuntu.com/old-releases.ubuntu.com/g" /etc/apt/sources.list
|
||||
|
||||
RUN apt-get update && apt-get install -y git python2.7 time
|
||||
|
||||
RUN mkdir /workspace
|
||||
RUN git clone https://github.com/maggieddie/pushdownoo.git /workspace/pushdownoo
|
||||
|
||||
RUN cd /workspace/pushdownoo && git checkout c06e03f6501d1441389d17271e44b9f345f637ff
|
||||
|
||||
RUN apt-get update && apt-get install -y openjdk-6-jdk make graphviz && \
|
||||
ln -s /usr/bin/dot /usr/local/bin/dot
|
||||
|
||||
RUN sed -i '266d' /workspace/pushdownoo/pdafordalvik/build.xml && \
|
||||
sed -i '262,264d' /workspace/pushdownoo/pdafordalvik/build.xml && \
|
||||
sed -i '163,164d' /workspace/pushdownoo/pdafordalvik/android-knowledge/sinks.txt && \
|
||||
sed -i '158d' /workspace/pushdownoo/pdafordalvik/android-knowledge/sinks.txt && \
|
||||
sed -i '80,83d' /workspace/pushdownoo/pdafordalvik/android-knowledge/classes.txt && \
|
||||
sed -i '410,412d' /workspace/pushdownoo/pdafordalvik/android-knowledge/callbacks.txt && \
|
||||
sed -i '407,408d' /workspace/pushdownoo/pdafordalvik/android-knowledge/callbacks.txt && \
|
||||
cd /workspace/pushdownoo/jdex2sex && make clean && make
|
||||
COPY run.sh /
|
23
rasta_exp/docker/anadroid/provided_build/run.sh
Executable file
23
rasta_exp/docker/anadroid/provided_build/run.sh
Executable file
|
@ -0,0 +1,23 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
cd ${WORKDIR}
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} java ${JAVA_PARAM} -jar ${JAR_FILE} org.ucombinator.dalvik.cfa.cesk.RunAnalysis --k 1 --gc --lra --aco --godel /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
23
rasta_exp/docker/anadroid/run.sh
Executable file
23
rasta_exp/docker/anadroid/run.sh
Executable file
|
@ -0,0 +1,23 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
cd ${WORKDIR}
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} java ${JAVA_PARAM} -jar ${JAR_FILE} org.ucombinator.dalvik.cfa.cesk.RunAnalysis --k 1 --gc --lra --aco --godel /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
122
rasta_exp/docker/anadroid/test.py
Normal file
122
rasta_exp/docker/anadroid/test.py
Normal file
|
@ -0,0 +1,122 @@
|
|||
import datetime
|
||||
import importlib.util
|
||||
import logging
|
||||
|
||||
from typing import Any
|
||||
from pathlib import Path
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
sys.path.append(str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
import orchestrator
|
||||
|
||||
errors = orchestrator.error_collector
|
||||
utils = orchestrator.utils
|
||||
|
||||
TIMEOUT = 900
|
||||
|
||||
GUEST_MNT = "/mnt"
|
||||
PATH_APK = f"{GUEST_MNT}/app.apk"
|
||||
|
||||
JAVA_PARAM = "-XX:MaxPermSize=512m -Xms512m -Xmx1024M -Xss1024m"
|
||||
WORKDIR = "/workspace/pushdownoo/pdafordalvik"
|
||||
JAR_FILE = "/workspace/pushdownoo/pdafordalvik/artifacts/PushdownOO_Exflow.jar"
|
||||
# CMD = f"java {JAVA_PARAM} -jar {JAR_FILE} org.ucombinator.dalvik.cfa.cesk.RunAnalysis --k 1 --gc --lra --aco --godel --dump-graph {PATH_APK}" # --dump-graph takes so much time!
|
||||
CMD = f"java {JAVA_PARAM} -jar {JAR_FILE} org.ucombinator.dalvik.cfa.cesk.RunAnalysis --k 1 --gc --lra --aco --godel {PATH_APK}"
|
||||
|
||||
TOOL_NAME = "anadroid"
|
||||
|
||||
# Version name -> folder name
|
||||
TOOL_VERSIONS = {
|
||||
"home_build": "home_build",
|
||||
"provided_build": "provided_build",
|
||||
}
|
||||
# Name of the default version (default folder = TOOL_VERSIONS[DEFAULT_TOOL_VERSION])
|
||||
DEFAULT_TOOL_VERSION = "home_build"
|
||||
|
||||
EXPECTED_ERROR_TYPES = [errors.JavaError, errors.PythonError]
|
||||
|
||||
|
||||
def analyse_artifacts(path: Path) -> dict[str, Any]:
|
||||
"""Analyse the artifacts of a test located at `path`."""
|
||||
report = utils.parse_report(path / "report")
|
||||
report["errors"] = list(
|
||||
map(
|
||||
lambda e: e.get_dict(),
|
||||
errors.get_errors(path / "stderr", EXPECTED_ERROR_TYPES),
|
||||
)
|
||||
)
|
||||
if report["timeout"]:
|
||||
report["tool-status"] = "TIMEOUT"
|
||||
elif check_success(path):
|
||||
report["tool-status"] = "FINISHED"
|
||||
else:
|
||||
report["tool-status"] = "FAILED"
|
||||
report["tool-name"] = TOOL_NAME
|
||||
report["date"] = str(datetime.datetime.now())
|
||||
report["apk"] = utils.sha256_sum(path / "app.apk").upper()
|
||||
return report
|
||||
|
||||
|
||||
def check_success(path: Path) -> bool:
|
||||
"""Check if the analysis finished without crashing."""
|
||||
stdout = path / "stdout"
|
||||
with stdout.open("r", errors="replace") as f:
|
||||
# Check if the version of the tool used is the one with the add println
|
||||
modified_version = (
|
||||
"ee3d6c7015b83b3dc84b21a2e79506175f07c00ecf03e7b3b8edea4e445618bd: START OF ANALYSIS."
|
||||
in f.readline()
|
||||
)
|
||||
with stdout.open("r", errors="replace") as f:
|
||||
for line in f:
|
||||
if modified_version and (
|
||||
"ee3d6c7015b83b3dc84b21a2e79506175f07c00ecf03e7b3b8edea4e445618bd: END OF ANALYSIS."
|
||||
in line
|
||||
):
|
||||
return True
|
||||
# If we use the orginal tool and the tool worked, this line should appear
|
||||
# WARNING: the path to the graph depend on the name and location of the app, the one
|
||||
# use hear is the one for /mnt/app.apk
|
||||
if (
|
||||
not modified_version
|
||||
and "--dump-graph" in CMD
|
||||
and "Dyck State Graph dumped into /mnt/app/graphs/graph-1-pdcfa-gc-lra.dot"
|
||||
in line
|
||||
):
|
||||
return True
|
||||
if (
|
||||
not modified_version
|
||||
and "--dump-graph" not in CMD
|
||||
and "Dyck State Graph dumped into /mnt/app/graphs/graph-1-pdcfa-gc-lra.dot"
|
||||
in line
|
||||
):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import docker # type: ignore
|
||||
|
||||
args = orchestrator.get_test_args(TOOL_NAME)
|
||||
|
||||
tool_folder = Path(__file__).resolve().parent
|
||||
api_key = orchestrator.get_androzoo_key()
|
||||
if args.get_apk_info:
|
||||
orchestrator.load_apk_info(args.apk_refs, args.androzoo_list, api_key)
|
||||
client = docker.from_env()
|
||||
|
||||
logging.info("Command tested: ")
|
||||
logging.info(f"[{WORKDIR}]$ {CMD}")
|
||||
|
||||
for apk_ref in args.apk_refs:
|
||||
orchestrator.test_tool_on_apk(
|
||||
client,
|
||||
tool_folder,
|
||||
api_key,
|
||||
apk_ref,
|
||||
args.tool_version,
|
||||
args.keep_artifacts,
|
||||
args.force_test,
|
||||
)
|
1
rasta_exp/docker/androguard/RASTA_VERSION
Normal file
1
rasta_exp/docker/androguard/RASTA_VERSION
Normal file
|
@ -0,0 +1 @@
|
|||
v3_3_5
|
5
rasta_exp/docker/androguard/README.md
Normal file
5
rasta_exp/docker/androguard/README.md
Normal file
|
@ -0,0 +1,5 @@
|
|||
# Androguard
|
||||
|
||||
- [source](https://github.com/androguard/androguard/)
|
||||
- language: Python3 javascript (for frida)
|
||||
- License: Apache 2.0
|
22
rasta_exp/docker/androguard/run.sh
Executable file
22
rasta_exp/docker/androguard/run.sh
Executable file
|
@ -0,0 +1,22 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} androguard decompile -o /mnt/out /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
10
rasta_exp/docker/androguard/v3_3_5/Dockerfile
Normal file
10
rasta_exp/docker/androguard/v3_3_5/Dockerfile
Normal file
|
@ -0,0 +1,10 @@
|
|||
FROM python:3.11-slim
|
||||
RUN apt-get update && apt-get -y install time python3-pip git
|
||||
COPY run.sh /
|
||||
|
||||
|
||||
RUN mkdir /workspace
|
||||
|
||||
RUN git clone --depth 1 --branch v3.3.5 https://github.com/androguard/androguard.git /workspace/androguard
|
||||
RUN python3 -m pip install -e /workspace/androguard
|
||||
COPY main.py /workspace/
|
8
rasta_exp/docker/androguard/v3_3_5/main.py
Normal file
8
rasta_exp/docker/androguard/v3_3_5/main.py
Normal file
|
@ -0,0 +1,8 @@
|
|||
from androguard.misc import AnalyzeAPK
|
||||
import sys
|
||||
|
||||
a, d, dx = AnalyzeAPK(sys.argv[1])
|
||||
|
||||
print(
|
||||
"ee3d6c7015b83b3dc84b21a2e79506175f07c00ecf03e7b3b8edea4e445618bd: END OF ANALYSIS."
|
||||
)
|
22
rasta_exp/docker/androguard/v3_3_5/run.sh
Executable file
22
rasta_exp/docker/androguard/v3_3_5/run.sh
Executable file
|
@ -0,0 +1,22 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} python3 /workspace/main.py /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
1
rasta_exp/docker/androguard_dad/RASTA_VERSION
Normal file
1
rasta_exp/docker/androguard_dad/RASTA_VERSION
Normal file
|
@ -0,0 +1 @@
|
|||
v3_3_5
|
7
rasta_exp/docker/androguard_dad/README.md
Normal file
7
rasta_exp/docker/androguard_dad/README.md
Normal file
|
@ -0,0 +1,7 @@
|
|||
# Androguard
|
||||
|
||||
- [source](https://github.com/androguard/androguard/)
|
||||
- [paper]() TODO
|
||||
- language: Python3 javascript (for frida)
|
||||
- number of years without at least 1 commit since first commit: 0 (2010 - 2022)
|
||||
- License: Apache 2.0
|
14
rasta_exp/docker/androguard_dad/latest_2022/Dockerfile
Normal file
14
rasta_exp/docker/androguard_dad/latest_2022/Dockerfile
Normal file
|
@ -0,0 +1,14 @@
|
|||
FROM python:3.11-slim
|
||||
RUN apt-get update && apt-get -y install time python3-pip git
|
||||
COPY run.sh /
|
||||
|
||||
|
||||
RUN mkdir /workspace
|
||||
|
||||
RUN git init /workspace/androguard && \
|
||||
cd /workspace/androguard && \
|
||||
git remote add origin https://github.com/androguard/androguard.git && \
|
||||
git fetch --depth=1 origin 832104db3eb5dc3cc66b30883fa8ce8712dfa200 && \
|
||||
git reset --hard FETCH_HEAD
|
||||
RUN cd /workspace/androguard && \
|
||||
python3 -m pip install -r requirements.txt
|
23
rasta_exp/docker/androguard_dad/latest_2022/run.sh
Executable file
23
rasta_exp/docker/androguard_dad/latest_2022/run.sh
Executable file
|
@ -0,0 +1,23 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} androguard decompile -o /mnt/out /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
5
rasta_exp/docker/androguard_dad/pip/Dockerfile
Normal file
5
rasta_exp/docker/androguard_dad/pip/Dockerfile
Normal file
|
@ -0,0 +1,5 @@
|
|||
FROM python:3.11-slim
|
||||
RUN apt-get update && apt-get -y install time
|
||||
COPY run.sh /
|
||||
|
||||
RUN python3 -m pip install androguard
|
22
rasta_exp/docker/androguard_dad/pip/run.sh
Executable file
22
rasta_exp/docker/androguard_dad/pip/run.sh
Executable file
|
@ -0,0 +1,22 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} androguard decompile -o /mnt/out /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
22
rasta_exp/docker/androguard_dad/run.sh
Executable file
22
rasta_exp/docker/androguard_dad/run.sh
Executable file
|
@ -0,0 +1,22 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} androguard decompile -o /mnt/out /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
96
rasta_exp/docker/androguard_dad/test.py
Normal file
96
rasta_exp/docker/androguard_dad/test.py
Normal file
|
@ -0,0 +1,96 @@
|
|||
import datetime
|
||||
import importlib.util
|
||||
import logging
|
||||
|
||||
from typing import Any, Type
|
||||
from pathlib import Path
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
sys.path.append(str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
#import orchestrator
|
||||
|
||||
errors = orchestrator.error_collector
|
||||
utils = orchestrator.utils
|
||||
|
||||
TIMEOUT = 900
|
||||
|
||||
GUEST_MNT = "/mnt"
|
||||
PATH_APK = f"{GUEST_MNT}/app.apk"
|
||||
|
||||
WORKDIR = "/"
|
||||
CMD = f"androguard decompile -o {GUEST_MNT}/out {PATH_APK}"
|
||||
|
||||
TOOL_NAME = "androguard"
|
||||
|
||||
# Version name -> folder name
|
||||
TOOL_VERSIONS = {
|
||||
"v3.3.5": "v3_3_5",
|
||||
# "latest": "latest_2022", # the current master is not stable
|
||||
}
|
||||
# Name of the default version (default folder = TOOL_VERSIONS[DEFAULT_TOOL_VERSION])
|
||||
DEFAULT_TOOL_VERSION = "v3.3.5"
|
||||
|
||||
EXPECTED_ERROR_TYPES: list[Type[errors.LoggedError]] = [errors.PythonError]
|
||||
|
||||
|
||||
def analyse_artifacts(path: Path) -> dict[str, Any]:
|
||||
"""Analyse the artifacts of a test located at `path`."""
|
||||
report = utils.parse_report(path / "report")
|
||||
report["errors"] = list(
|
||||
map(
|
||||
lambda e: e.get_dict(),
|
||||
errors.get_errors(path / "stderr", EXPECTED_ERROR_TYPES),
|
||||
)
|
||||
)
|
||||
if report["timeout"]:
|
||||
report["tool-status"] = "TIMEOUT"
|
||||
elif check_success(path):
|
||||
report["tool-status"] = "FINISHED"
|
||||
else:
|
||||
report["tool-status"] = "FAILED"
|
||||
report["tool-name"] = TOOL_NAME
|
||||
report["date"] = str(datetime.datetime.now())
|
||||
report["apk"] = utils.sha256_sum(path / "app.apk").upper()
|
||||
return report
|
||||
|
||||
|
||||
def check_success(path: Path) -> bool:
|
||||
"""Check if the analysis finished without crashing."""
|
||||
stdout = path / "stdout"
|
||||
with stdout.open("r", errors="replace") as f:
|
||||
for line in f:
|
||||
if (
|
||||
"ee3d6c7015b83b3dc84b21a2e79506175f07c00ecf03e7b3b8edea4e445618bd: END OF ANALYSIS."
|
||||
in line
|
||||
):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import docker # type: ignore
|
||||
|
||||
args = orchestrator.get_test_args(TOOL_NAME)
|
||||
|
||||
tool_folder = Path(__file__).resolve().parent
|
||||
api_key = orchestrator.get_androzoo_key()
|
||||
if args.get_apk_info:
|
||||
orchestrator.load_apk_info(args.apk_refs, args.androzoo_list, api_key)
|
||||
client = docker.from_env()
|
||||
|
||||
logging.info("Command tested: ")
|
||||
logging.info(f"[{WORKDIR}]$ {CMD}")
|
||||
|
||||
for apk_ref in args.apk_refs:
|
||||
orchestrator.test_tool_on_apk(
|
||||
client,
|
||||
tool_folder,
|
||||
api_key,
|
||||
apk_ref,
|
||||
args.tool_version,
|
||||
args.keep_artifacts,
|
||||
args.force_test,
|
||||
)
|
10
rasta_exp/docker/androguard_dad/v3_3_5/Dockerfile
Normal file
10
rasta_exp/docker/androguard_dad/v3_3_5/Dockerfile
Normal file
|
@ -0,0 +1,10 @@
|
|||
FROM python:3.11-slim
|
||||
RUN apt-get update && apt-get -y install time python3-pip git
|
||||
COPY run.sh /
|
||||
|
||||
|
||||
RUN mkdir /workspace
|
||||
|
||||
RUN git clone --depth 1 --branch v3.3.5 https://github.com/androguard/androguard.git /workspace/androguard
|
||||
RUN sed -i '396i\\ print("ee3d6c7015b83b3dc84b21a2e79506175f07c00ecf03e7b3b8edea4e445618bd: END OF ANALYSIS.")' /workspace/androguard/androguard/cli/entry_points.py && \
|
||||
python3 -m pip install -e /workspace/androguard
|
22
rasta_exp/docker/androguard_dad/v3_3_5/run.sh
Executable file
22
rasta_exp/docker/androguard_dad/v3_3_5/run.sh
Executable file
|
@ -0,0 +1,22 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} androguard decompile -o /mnt/out /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
1
rasta_exp/docker/apparecium/RASTA_VERSION
Normal file
1
rasta_exp/docker/apparecium/RASTA_VERSION
Normal file
|
@ -0,0 +1 @@
|
|||
latest
|
9
rasta_exp/docker/apparecium/README.md
Normal file
9
rasta_exp/docker/apparecium/README.md
Normal file
|
@ -0,0 +1,9 @@
|
|||
# apparecium
|
||||
|
||||
- [source](https://github.com/askk/apparecium)
|
||||
- [fork](https://github.com/cogbee/apparecium)
|
||||
- [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7098024&tag=1)
|
||||
- language: Python 2 (+some C++ in old version of androguard)
|
||||
- number of years without at least 1 commit since first commit: 9
|
||||
- License: MIT
|
||||
|
23
rasta_exp/docker/apparecium/latest/Dockerfile
Normal file
23
rasta_exp/docker/apparecium/latest/Dockerfile
Normal file
|
@ -0,0 +1,23 @@
|
|||
FROM ubuntu:22.04
|
||||
|
||||
# RUN sed -i -e "s/archive.ubuntu.com/old-releases.ubuntu.com/g" /etc/apt/sources.list
|
||||
|
||||
RUN apt-get update && apt-get install -y git time
|
||||
|
||||
RUN mkdir /workspace
|
||||
RUN git init /workspace/apparecium && \
|
||||
cd /workspace/apparecium && \
|
||||
git remote add origin https://github.com/askk/apparecium.git && \
|
||||
git fetch --depth=1 origin e27e108950e56b69f34fa97262c07d154b9163e8 && \
|
||||
git reset --hard FETCH_HEAD
|
||||
|
||||
RUN apt-get update && apt-get install -y python2.7 wget && \
|
||||
ln -s /usr/bin/python2.7 /usr/bin/python
|
||||
|
||||
RUN wget https://bootstrap.pypa.io/pip/2.7/get-pip.py && \
|
||||
python2.7 get-pip.py && \
|
||||
rm get-pip.py && \
|
||||
python2.7 -m pip install pydot
|
||||
RUN sed -i 's#d3-visualization#/mnt#' /workspace/apparecium/dftest.py
|
||||
|
||||
COPY run.sh /
|
25
rasta_exp/docker/apparecium/latest/run.sh
Executable file
25
rasta_exp/docker/apparecium/latest/run.sh
Executable file
|
@ -0,0 +1,25 @@
|
|||
#!/usr/bin/env bash
|
||||
#
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
|
||||
WORKDIR="/workspace/apparecium"
|
||||
cd ${WORKDIR}
|
||||
mkdir /mnt/data
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} python dftest.py /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
99
rasta_exp/docker/apparecium/test.py
Normal file
99
rasta_exp/docker/apparecium/test.py
Normal file
|
@ -0,0 +1,99 @@
|
|||
import datetime
|
||||
import importlib.util
|
||||
import logging
|
||||
|
||||
from typing import Any, Type
|
||||
from pathlib import Path
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
sys.path.append(str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
import orchestrator
|
||||
|
||||
errors = orchestrator.error_collector
|
||||
utils = orchestrator.utils
|
||||
|
||||
TIMEOUT = 900
|
||||
|
||||
GUEST_MNT = "/mnt"
|
||||
PATH_APK = f"{GUEST_MNT}/app.apk"
|
||||
|
||||
WORKDIR = "/workspace/apparecium"
|
||||
CMD = f"python runner.py {PATH_APK} >> '{GUEST_MNT}/stdout' 2>> '{GUEST_MNT}/stderr'; cp -r /workspace/apparecium/d3-visualization/data {GUEST_MNT}/"
|
||||
|
||||
TOOL_NAME = "apparecium"
|
||||
|
||||
# Version name -> folder name
|
||||
TOOL_VERSIONS = {
|
||||
"latest": "latest",
|
||||
"fork_latest": "fork_latest",
|
||||
}
|
||||
# Name of the default version (default folder = TOOL_VERSIONS[DEFAULT_TOOL_VERSION])
|
||||
DEFAULT_TOOL_VERSION = "latest"
|
||||
|
||||
EXPECTED_ERROR_TYPES: list[Type[errors.LoggedError]] = [errors.PythonError]
|
||||
|
||||
|
||||
def analyse_artifacts(path: Path) -> dict[str, Any]:
|
||||
"""Analyse the artifacts of a test located at `path`."""
|
||||
report = utils.parse_report(path / "report")
|
||||
report["errors"] = list(
|
||||
map(
|
||||
lambda e: e.get_dict(),
|
||||
errors.get_errors(path / "stderr", EXPECTED_ERROR_TYPES),
|
||||
)
|
||||
)
|
||||
if report["timeout"]:
|
||||
report["tool-status"] = "TIMEOUT"
|
||||
elif check_success(path):
|
||||
report["tool-status"] = "FINISHED"
|
||||
else:
|
||||
report["tool-status"] = "FAILED"
|
||||
report["tool-name"] = TOOL_NAME
|
||||
report["date"] = str(datetime.datetime.now())
|
||||
report["apk"] = utils.sha256_sum(path / "app.apk").upper()
|
||||
return report
|
||||
|
||||
|
||||
def check_success(path: Path) -> bool:
|
||||
"""Check if the analysis finished without crashing."""
|
||||
if (path / "data" / "app.apk.json").exists():
|
||||
return True
|
||||
l1 = False
|
||||
with (path / "stdout").open(errors="replace") as file:
|
||||
for line in file:
|
||||
if "Complete Analysis took" in line: # check if androguard worked
|
||||
l1 = True
|
||||
if (
|
||||
l1 and "\t\tDone in " in line
|
||||
): # check if apparecium worked after androguard
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import docker # type: ignore
|
||||
|
||||
args = orchestrator.get_test_args(TOOL_NAME)
|
||||
|
||||
tool_folder = Path(__file__).resolve().parent
|
||||
api_key = orchestrator.get_androzoo_key()
|
||||
if args.get_apk_info:
|
||||
orchestrator.load_apk_info(args.apk_refs, args.androzoo_list, api_key)
|
||||
client = docker.from_env()
|
||||
|
||||
logging.info("Command tested: ")
|
||||
logging.info(f"[{WORKDIR}]$ {CMD}")
|
||||
|
||||
for apk_ref in args.apk_refs:
|
||||
orchestrator.test_tool_on_apk(
|
||||
client,
|
||||
tool_folder,
|
||||
api_key,
|
||||
apk_ref,
|
||||
args.tool_version,
|
||||
args.keep_artifacts,
|
||||
args.force_test,
|
||||
)
|
1
rasta_exp/docker/blueseal/RASTA_VERSION
Normal file
1
rasta_exp/docker/blueseal/RASTA_VERSION
Normal file
|
@ -0,0 +1 @@
|
|||
home_build
|
25
rasta_exp/docker/blueseal/README.md
Normal file
25
rasta_exp/docker/blueseal/README.md
Normal file
|
@ -0,0 +1,25 @@
|
|||
# Blueseal
|
||||
|
||||
- [source](https://github.com/ub-rms/blueseal)
|
||||
- [paper](https://dl.acm.org/doi/10.1145/2642937.2643018)
|
||||
- language: Java7
|
||||
- Build: Ant
|
||||
- number of years without at least 1 commit since first commit: 7
|
||||
- License: None
|
||||
|
||||
## Notes
|
||||
|
||||
Troubles on laptop:
|
||||
|
||||
Build:
|
||||
|
||||
```
|
||||
docker build --ulimit nofile=65536:65536 .
|
||||
```
|
||||
|
||||
Run
|
||||
|
||||
```
|
||||
docker run --ulimit nofile=65536:65536 -it -v ...
|
||||
```
|
||||
|
21
rasta_exp/docker/blueseal/home_build/Dockerfile
Normal file
21
rasta_exp/docker/blueseal/home_build/Dockerfile
Normal file
|
@ -0,0 +1,21 @@
|
|||
FROM ubuntu:14.04
|
||||
|
||||
# RUN sed -i -e "s/archive.ubuntu.com/old-releases.ubuntu.com/g" /etc/apt/sources.list
|
||||
RUN apt-get update && apt-get install -y git time
|
||||
|
||||
RUN mkdir /workspace
|
||||
RUN git clone https://github.com/ub-rms/blueseal.git /workspace/blueseal && \
|
||||
cd /workspace/blueseal && git checkout 95e820049f9ded681019724d0b4a86dc028bd78b
|
||||
|
||||
RUN rm -rf /workspace/blueseal/BlueSeal/android-jars && \
|
||||
git clone https://github.com/Sable/android-platforms.git /workspace/blueseal/BlueSeal/android-jars && \
|
||||
cd /workspace/blueseal/BlueSeal/android-jars && git checkout 74c993c02160cdeb1d52e46017a2ecd536ea1d5d
|
||||
|
||||
|
||||
RUN apt-get update && apt-get install -y openjdk-7-jdk ant
|
||||
|
||||
RUN cd /workspace/blueseal/BlueSeal && mkdir /workspace/blueseal/BlueSeal/bin && \
|
||||
ant build
|
||||
|
||||
RUN sed -i 's#^exec java# exec java -Duser.home=/tmp/user/#' /workspace/blueseal/BlueSeal/tools/apktool
|
||||
COPY run.sh /
|
29
rasta_exp/docker/blueseal/home_build/run.sh
Executable file
29
rasta_exp/docker/blueseal/home_build/run.sh
Executable file
|
@ -0,0 +1,29 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
# params: APK_FILENAME
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
|
||||
WORKDIR="/mnt"
|
||||
cd ${WORKDIR}
|
||||
ln -s /workspace/blueseal/BlueSeal/input /mnt/
|
||||
ln -s /workspace/blueseal/BlueSeal/tools /mnt/
|
||||
ln -s /workspace/blueseal/BlueSeal/android-jars /mnt/
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} java ${JAVA_PARAM} edu.buffalo.cse.blueseal.BSFlow.InterProceduralMain /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
29
rasta_exp/docker/blueseal/run.sh
Executable file
29
rasta_exp/docker/blueseal/run.sh
Executable file
|
@ -0,0 +1,29 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
# params: APK_FILENAME
|
||||
|
||||
APK_FILENAME=$1
|
||||
|
||||
export TIME="time: %e
|
||||
kernel-cpu-time: %S
|
||||
user-cpu-time: %U
|
||||
max-rss-mem: %M
|
||||
avg-rss-mem: %t
|
||||
avg-total-mem: %K
|
||||
page-size: %Z
|
||||
nb-major-page-fault: %F
|
||||
nb-minor-page-fault: %R
|
||||
nb-fs-input: %I
|
||||
nb-fs-output: %O
|
||||
nb-socket-msg-received: %r
|
||||
nb-socket-msg-sent: %s
|
||||
nb-signal-delivered: %k
|
||||
exit-status: %x"
|
||||
|
||||
|
||||
WORKDIR="/mnt"
|
||||
cd ${WORKDIR}
|
||||
ln -s /workspace/blueseal/BlueSeal/input /mnt/
|
||||
ln -s /workspace/blueseal/BlueSeal/tools /mnt/
|
||||
ln -s /workspace/blueseal/BlueSeal/android-jars /mnt/
|
||||
/usr/bin/time -o /mnt/report -q /usr/bin/timeout --kill-after=20s ${TIMEOUT} java ${JAVA_PARAM} edu.buffalo.cse.blueseal.BSFlow.InterProceduralMain /mnt/${APK_FILENAME} > /mnt/stdout 2> /mnt/stderr
|
104
rasta_exp/docker/blueseal/test.py
Normal file
104
rasta_exp/docker/blueseal/test.py
Normal file
|
@ -0,0 +1,104 @@
|
|||
import datetime
|
||||
import importlib.util
|
||||
import logging
|
||||
import re
|
||||
|
||||
from typing import Any, Type, Optional
|
||||
from pathlib import Path
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
sys.path.append(str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
import orchestrator
|
||||
|
||||
errors = orchestrator.error_collector
|
||||
utils = orchestrator.utils
|
||||
|
||||
TIMEOUT = 900
|
||||
|
||||
GUEST_MNT = "/mnt"
|
||||
PATH_APK = f"{GUEST_MNT}/app.apk"
|
||||
|
||||
WORKDIR = "/workspace/blueseal/BlueSeal"
|
||||
|
||||
JAVA_PARAM = "-cp 'libs/AXMLPrinter2.jar:libs/commons-io-2.4.jar:libs/polyglotclasses-1.3.5.jar:libs/baksmali-1.3.2.jar:libs/jasminclasses-2.5.0.jar:libs/soot.jar:bin'"
|
||||
CMD = (
|
||||
f"java {JAVA_PARAM} edu.buffalo.cse.blueseal.BSFlow.InterProceduralMain {PATH_APK}"
|
||||
)
|
||||
|
||||
TOOL_NAME = "blueseal"
|
||||
|
||||
# Version name -> folder name
|
||||
TOOL_VERSIONS = {
|
||||
"home_build": "home_build",
|
||||
}
|
||||
# Name of the default version (default folder = TOOL_VERSIONS[DEFAULT_TOOL_VERSION])
|
||||
DEFAULT_TOOL_VERSION = "home_build"
|
||||
|
||||
|
||||
EXPECTED_ERROR_TYPES: list[Type[errors.LoggedError]] = [
|
||||
errors.JavaError,
|
||||
errors.NoPrefixJavaError,
|
||||
]
|
||||
|
||||
|
||||
def analyse_artifacts(path: Path) -> dict[str, Any]:
|
||||
"""Analyse the artifacts of a test located at `path`."""
|
||||
report = utils.parse_report(path / "report")
|
||||
report["errors"] = list(
|
||||
map(
|
||||
lambda e: e.get_dict(),
|
||||
errors.get_errors(path / "stderr", EXPECTED_ERROR_TYPES),
|
||||
)
|
||||
)
|
||||
if report["timeout"]:
|
||||
report["tool-status"] = "TIMEOUT"
|
||||
elif check_success(path):
|
||||
report["tool-status"] = "FINISHED"
|
||||
else:
|
||||
report["tool-status"] = "FAILED"
|
||||
report["tool-name"] = TOOL_NAME
|
||||
report["date"] = str(datetime.datetime.now())
|
||||
report["apk"] = utils.sha256_sum(path / "app.apk").upper()
|
||||
return report
|
||||
|
||||
|
||||
def check_success(path: Path) -> bool:
|
||||
"""Check if the analysis finished without crashing."""
|
||||
l1 = False
|
||||
with (path / "stdout").open("r", errors="replace") as stdout:
|
||||
for line in stdout:
|
||||
if l1 and "Soot has run for " in line:
|
||||
return True
|
||||
l1 = False
|
||||
if "Soot finished on " in line:
|
||||
l1 = True
|
||||
return False
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import docker # type: ignore
|
||||
|
||||
args = orchestrator.get_test_args(TOOL_NAME)
|
||||
|
||||
tool_folder = Path(__file__).resolve().parent
|
||||
api_key = orchestrator.get_androzoo_key()
|
||||
if args.get_apk_info:
|
||||
orchestrator.load_apk_info(args.apk_refs, args.androzoo_list, api_key)
|
||||
client = docker.from_env()
|
||||
|
||||
logging.info("Command tested: ")
|
||||
logging.info(f"[{WORKDIR}]$ {CMD}")
|
||||
|
||||
for apk_ref in args.apk_refs:
|
||||
orchestrator.test_tool_on_apk(
|
||||
client,
|
||||
tool_folder,
|
||||
api_key,
|
||||
apk_ref,
|
||||
args.tool_version,
|
||||
args.keep_artifacts,
|
||||
args.force_test,
|
||||
)
|
1
rasta_exp/docker/choi_et_al/RASTA_VERSION
Normal file
1
rasta_exp/docker/choi_et_al/RASTA_VERSION
Normal file
|
@ -0,0 +1 @@
|
|||
home_build
|
7
rasta_exp/docker/choi_et_al/README.md
Normal file
7
rasta_exp/docker/choi_et_al/README.md
Normal file
|
@ -0,0 +1,7 @@
|
|||
# Choi et al.
|
||||
|
||||
- [source](https://github.com/kwanghoon/JavaAnalysis)
|
||||
- [paper](https://www.sciencedirect.com/science/article/pii/S0020019014001069)
|
||||
- language: Haskell (GHC < 7.8 ? GHC 7.0.4 should works, 7.6.3 looks good too)
|
||||
- number of years without at least 1 release since first release: 9
|
||||
- License: None
|
28
rasta_exp/docker/choi_et_al/home_build/Dockerfile
Normal file
28
rasta_exp/docker/choi_et_al/home_build/Dockerfile
Normal file
|
@ -0,0 +1,28 @@
|
|||
FROM ubuntu:14.04
|
||||
|
||||
#RUN sed -i -e "s/archive.ubuntu.com/old-releases.ubuntu.com/g" /etc/apt/sources.list
|
||||
|
||||
RUN apt-get update && apt-get install -y time git
|
||||
|
||||
RUN mkdir /workspace && git init /workspace/JavaAnalysis && \
|
||||
cd /workspace/JavaAnalysis && \
|
||||
git remote add origin https://github.com/kwanghoon/JavaAnalysis.git && \
|
||||
git fetch --depth=1 origin fba12cc22338b6f425a0c71168dc55afe964345d && \
|
||||
git reset --hard FETCH_HEAD
|
||||
|
||||
RUN apt-get update && apt-get install -y ghc libghc-mtl-dev
|
||||
|
||||
RUN cd /workspace/JavaAnalysis && \
|
||||
ghc Main
|
||||
|
||||
# Install JADX to convert apk to .java
|
||||
# Use nixpkgs because older versions of jadx are really unstable and the more
|
||||
# recent version require java 8, which is not available on ubuntu 14, which is the
|
||||
# only distro I fond that can compile and run this tool
|
||||
RUN apt-get update && apt-get install -y curl xz-utils && \
|
||||
useradd -b /home -G sudo -m nix && \
|
||||
echo 'nix ALL=(ALL) NOPASSWD:ALL' | EDITOR='tee -a' visudo && \
|
||||
HOME=/home/nix sudo -u nix bash -c 'sh <(curl -k -L https://nixos.org/nix/install) --no-daemon' && \
|
||||
HOME=/home/nix sudo -u nix bash -c '. /home/nix/.nix-profile/etc/profile.d/nix.sh && nix-env -iA nixpkgs.jadx'
|
||||
|
||||
COPY run.sh /workspace/run.sh
|
6
rasta_exp/docker/choi_et_al/home_build/run.sh
Executable file
6
rasta_exp/docker/choi_et_al/home_build/run.sh
Executable file
|
@ -0,0 +1,6 @@
|
|||
#!/bin/sh
|
||||
|
||||
chown -R nix /mnt
|
||||
# Run jadx on /mnt/app.apk
|
||||
HOME=/home/nix sudo -u nix bash -c '. /home/nix/.nix-profile/etc/profile.d/nix.sh && cd /mnt && nix-shell -p jadx --run "jadx app.apk"'
|
||||
find /mnt/app -name '*.java' -print | xargs /workspace/JavaAnalysis/Main
|
185
rasta_exp/docker/choi_et_al/test.py
Normal file
185
rasta_exp/docker/choi_et_al/test.py
Normal file
|
@ -0,0 +1,185 @@
|
|||
import datetime
|
||||
import importlib.util
|
||||
import logging
|
||||
import re
|
||||
|
||||
from typing import Any, Type, Optional
|
||||
from pathlib import Path
|
||||
from more_itertools import peekable
|
||||
|
||||
#
|
||||
# ██╗ ██╗ ██╗ ██████╗
|
||||
# ██║ ██║ ██║ ██╔══██╗
|
||||
# ██║ █╗ ██║ ██║ ██████╔╝
|
||||
# ██║███╗██║ ██║ ██╔═══╝
|
||||
# ╚███╔███╔╝ ██║ ██║
|
||||
# ╚══╝╚══╝ ╚═╝ ╚═╝
|
||||
#
|
||||
# Looks like JADX is not good enought, waiting for the author response
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
sys.path.append(str(Path(__file__).resolve().parent.parent))
|
||||
|
||||
import orchestrator
|
||||
|
||||
errors = orchestrator.error_collector
|
||||
utils = orchestrator.utils
|
||||
|
||||
|
||||
TIMEOUT = 900
|
||||
|
||||
|
||||
GUEST_MNT = "/mnt"
|
||||
PATH_APK = f"{GUEST_MNT}/app.apk"
|
||||
|
||||
WORKDIR = f"{GUEST_MNT}"
|
||||
CMD = f"/workspace/run.sh"
|
||||
|
||||
TOOL_NAME = "choi_et_al"
|
||||
|
||||
# Version name -> folder name
|
||||
TOOL_VERSIONS = {
|
||||
"home_build": "home_build",
|
||||
}
|
||||
# Name of the default version (default folder = TOOL_VERSIONS[DEFAULT_TOOL_VERSION])
|
||||
DEFAULT_TOOL_VERSION = "home_build"
|
||||
|
||||
|
||||
class HaskellError(errors.LoggedError):
|
||||
error_re = re.compile(r"([a-zA-Z0-9])+: (.*)$")
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
first_line_nb: int,
|
||||
last_line_nb: int,
|
||||
origin: str,
|
||||
msg: str,
|
||||
logfile_name: str = "",
|
||||
):
|
||||
self.first_line_nb = first_line_nb
|
||||
self.last_line_nb = last_line_nb
|
||||
self.origin = origin
|
||||
self.msg = msg
|
||||
self.logfile_name = logfile_name
|
||||
|
||||
def __str__(self) -> str:
|
||||
return f"{self.origin}: {self.msg}"
|
||||
|
||||
def get_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"error_type": "haskell",
|
||||
"origin": self.origin,
|
||||
"msg": self.msg,
|
||||
"first_line": self.first_line_nb,
|
||||
"last_line": self.last_line_nb,
|
||||
"logfile_name": self.logfile_name,
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def parse_error(logs: peekable) -> Optional["HaskellError"]:
|
||||
line_nb, line = logs.peek((None, None))
|
||||
if line is None or line_nb is None:
|
||||
return None
|
||||
match = HaskellError.error_re.match(line)
|
||||
if match is None:
|
||||
return None
|
||||
error = HaskellError(
|
||||
line_nb,
|
||||
line_nb,
|
||||
match.group(1),
|
||||
match.group(2),
|
||||
)
|
||||
next(logs)
|
||||
return error
|
||||
|
||||
|
||||
EXPECTED_ERROR_TYPES: list[Type[errors.LoggedError]] = [
|
||||
errors.JavaError, # JADX
|
||||
errors.NoPrefixJavaError,
|
||||
]
|
||||
EXPECTED_ERROR_TYPES_STDERR: list[Type[errors.LoggedError]] = [
|
||||
errors.JavaError, # JADX
|
||||
errors.NoPrefixJavaError,
|
||||
HaskellError,
|
||||
]
|
||||
|
||||
|
||||
def analyse_artifacts(path: Path) -> dict[str, Any]:
|
||||
"""Analyse the artifacts of a test located at `path`."""
|
||||
report = utils.parse_report(path / "report")
|
||||
report["errors"] = list(
|
||||
map(
|
||||
lambda e: e.get_dict(),
|
||||
errors.get_errors(path / "stderr", EXPECTED_ERROR_TYPES_STDERR),
|
||||
)
|
||||
)
|
||||
report["errors"].extend(
|
||||
map(
|
||||
lambda e: e.get_dict(),
|
||||
errors.get_errors(path / "stdout", EXPECTED_ERROR_TYPES),
|
||||
)
|
||||
)
|
||||
if report["timeout"]:
|
||||
report["tool-status"] = "TIMEOUT"
|
||||
elif check_success(path, report):
|
||||
report["tool-status"] = "FINISHED"
|
||||
else:
|
||||
report["tool-status"] = "FAILED"
|
||||
report["tool-name"] = TOOL_NAME
|
||||
report["date"] = str(datetime.datetime.now())
|
||||
report["apk"] = utils.sha256_sum(path / "app.apk").upper()
|
||||
return report
|
||||
|
||||
|
||||
def check_success(path: Path, report: dict[str, Any]) -> bool:
|
||||
"""Check if the analysis finished without crashing."""
|
||||
if report["exit-status"] != 0:
|
||||
return False
|
||||
# If jadx failed the tool failed
|
||||
if not (path / "app").exists():
|
||||
return False
|
||||
if len(list((path / "app").glob("**/*.java"))) == 0:
|
||||
return False
|
||||
l1 = False
|
||||
l2 = False
|
||||
with (path / "stdout").open("r", errors="replace") as file:
|
||||
for line in file:
|
||||
if l2 and line == "done.\n":
|
||||
return True
|
||||
else:
|
||||
l2 = False
|
||||
if l1 and "seconds in total" in line:
|
||||
l1 = False
|
||||
l2 = True
|
||||
else:
|
||||
l1 = False
|
||||
if line == "Points-to graph: \n":
|
||||
l1 = True
|
||||
return False
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import docker # type: ignore
|
||||
|
||||
args = orchestrator.get_test_args(TOOL_NAME)
|
||||
tool_folder = Path(__file__).resolve().parent
|
||||
api_key = orchestrator.get_androzoo_key()
|
||||
if args.get_apk_info:
|
||||
orchestrator.load_apk_info(args.apk_refs, args.androzoo_list, api_key)
|
||||
client = docker.from_env()
|
||||
|
||||
logging.info("Command tested: ")
|
||||
logging.info(f"[{WORKDIR}]$ {CMD}")
|
||||
|
||||
for apk_ref in args.apk_refs:
|
||||
orchestrator.test_tool_on_apk(
|
||||
client,
|
||||
tool_folder,
|
||||
api_key,
|
||||
apk_ref,
|
||||
args.tool_version,
|
||||
args.keep_artifacts,
|
||||
args.force_test,
|
||||
)
|
1
rasta_exp/docker/dialdroid/RASTA_VERSION
Normal file
1
rasta_exp/docker/dialdroid/RASTA_VERSION
Normal file
|
@ -0,0 +1 @@
|
|||
home_build
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue