I declare this manuscript finished
All checks were successful
/ test_checkout (push) Successful in 1m48s

This commit is contained in:
Jean-Marie Mineau 2025-10-07 17:16:32 +02:00
parent 9f39ded209
commit 5c3a6955bd
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
14 changed files with 162 additions and 131 deletions

View file

@ -4,10 +4,8 @@
== Experiments <sec:rasta-xp>
=== #rq1: Re-Usability Evaluation
#figure(
image(
"figs/exit-status-for-the-drebin-dataset.svg",
@ -71,10 +69,10 @@
wognsen_et_al: a little less than 15% finished, a little less than 20% failed, the rest timed out
"
),
caption: [Exit status for the Rasta dataset],
caption: [Exit status for the RASTA dataset],
) <fig:rasta-exit>
@fig:rasta-exit-drebin and @fig:rasta-exit compare the Drebin and Rasta datasets.
@fig:rasta-exit-drebin and @fig:rasta-exit compare the Drebin and RASTA datasets.
They represent the success/failure rate (green/orange) of the tools.
We distinguished failure to compute a result from timeout (blue) and crashes of our evaluation framework (in grey, probably due to out-of-memory kills of the container itself).
Because it may be caused by a bug in our own analysis stack, exit statuses represented in grey (Other) are considered as unknown errors and not as failures of the tool.
@ -84,8 +82,8 @@ Results on the Drebin datasets show that 11 tools have a high success rate (grea
The other tools have poor results.
The worst, excluding Lotrack and Tresher, is Anadroid with a ratio under 20% of success.
On the Rasta dataset, we observe a global increase in the number of failed status: #resultunusablenb tools (#resultunusable) have a finishing rate below 50%.
The tools that have bad results with Drebin are, of course, bad results on Rasta.
On the RASTA dataset, we observe a global increase in the number of failed status: #resultunusablenb tools (#resultunusable) have a finishing rate below 50%.
The tools that have bad results with Drebin are, of course, bad results on RASTA.
Three tools (androguard_dad, blueseal, saaf) that were performing well (higher than 85%) on Drebin, surprisingly fall below the bar of 50% of failure.
7 tools keep a high success rate: Adagio, Amandroid, Androguard, Apparecium, Gator, Mallodroid, Redexer.
Regarding IC3, the fork with a simpler build process and support for modern OS has a lower success rate than the original tool.
@ -135,7 +133,7 @@ For the tools that we could run, #resultratio of analyses are finishing successf
supplement: none,
kind: "sub-rasta-exit-evolution"
) <fig:rasta-exit-evolution-not-java>]
), caption: [Exit status evolution for the Rasta dataset]
), caption: [Exit status evolution for the RASTA dataset]
) <fig:rasta-exit-evolution>
For investigating the effect of application dates on the tools, we computed the date of each #APK based on the minimum date between the first upload in AndroZoo and the first analysis in VirusTotal.
@ -293,7 +291,7 @@ The date is also correlated with the success rate for Java-based tools only.
table.hline(),
table.header(
table.cell(colspan: 3/*4*/, inset: 3pt)[],
table.cell(rowspan:2)[*Rasta part*],
table.cell(rowspan:2)[*RASTA part*],
table.vline(end: 3),
table.vline(start: 4),
table.cell(colspan:2)[*Average size* (MB)],
@ -358,7 +356,7 @@ sqlite> SELECT vt_detection == 0, COUNT(DISTINCT sha256) FROM apk WHERE dex_size
width: 100%,
alt: "Bar chart showing the % of analyse apk on the y-axis and the tools on the x-axis.
Each tools has two bars, one for goodware an one for malware.
The goodware bars are the same as the one in the figure Exit status for the Rasta dataset.
The goodware bars are the same as the one in the figure Exit status for the RASTA dataset.
The timeout rate looks the same on both bar of each tools.
The finishing rate of the malware bar is a lot higher than in the goodware bar for androguard_dad, blueseal, didfail, iccta, perfchecker and wogsen_et_al.
The finishing rate of the malware bar is higher than in the goodware bar for ic3 and ic3_fork.
@ -366,7 +364,7 @@ sqlite> SELECT vt_detection == 0, COUNT(DISTINCT sha256) FROM apk WHERE dex_size
The other tools have similar finishing rate, finishing rate slightly in favor of malware.
"
),
caption: [Exit status comparing goodware (left bars) and malware (right bars) for the Rasta dataset],
caption: [Exit status comparing goodware (left bars) and malware (right bars) for the RASTA dataset],
) <fig:rasta-exit-goodmal>
/*