I declare this manuscript finished
All checks were successful
/ test_checkout (push) Successful in 1m48s
All checks were successful
/ test_checkout (push) Successful in 1m48s
This commit is contained in:
parent
9f39ded209
commit
5c3a6955bd
14 changed files with 162 additions and 131 deletions
|
@ -4,10 +4,8 @@
|
|||
|
||||
== Experiments <sec:rasta-xp>
|
||||
|
||||
|
||||
=== #rq1: Re-Usability Evaluation
|
||||
|
||||
|
||||
#figure(
|
||||
image(
|
||||
"figs/exit-status-for-the-drebin-dataset.svg",
|
||||
|
@ -71,10 +69,10 @@
|
|||
wognsen_et_al: a little less than 15% finished, a little less than 20% failed, the rest timed out
|
||||
"
|
||||
),
|
||||
caption: [Exit status for the Rasta dataset],
|
||||
caption: [Exit status for the RASTA dataset],
|
||||
) <fig:rasta-exit>
|
||||
|
||||
@fig:rasta-exit-drebin and @fig:rasta-exit compare the Drebin and Rasta datasets.
|
||||
@fig:rasta-exit-drebin and @fig:rasta-exit compare the Drebin and RASTA datasets.
|
||||
They represent the success/failure rate (green/orange) of the tools.
|
||||
We distinguished failure to compute a result from timeout (blue) and crashes of our evaluation framework (in grey, probably due to out-of-memory kills of the container itself).
|
||||
Because it may be caused by a bug in our own analysis stack, exit statuses represented in grey (Other) are considered as unknown errors and not as failures of the tool.
|
||||
|
@ -84,8 +82,8 @@ Results on the Drebin datasets show that 11 tools have a high success rate (grea
|
|||
The other tools have poor results.
|
||||
The worst, excluding Lotrack and Tresher, is Anadroid with a ratio under 20% of success.
|
||||
|
||||
On the Rasta dataset, we observe a global increase in the number of failed status: #resultunusablenb tools (#resultunusable) have a finishing rate below 50%.
|
||||
The tools that have bad results with Drebin are, of course, bad results on Rasta.
|
||||
On the RASTA dataset, we observe a global increase in the number of failed status: #resultunusablenb tools (#resultunusable) have a finishing rate below 50%.
|
||||
The tools that have bad results with Drebin are, of course, bad results on RASTA.
|
||||
Three tools (androguard_dad, blueseal, saaf) that were performing well (higher than 85%) on Drebin, surprisingly fall below the bar of 50% of failure.
|
||||
7 tools keep a high success rate: Adagio, Amandroid, Androguard, Apparecium, Gator, Mallodroid, Redexer.
|
||||
Regarding IC3, the fork with a simpler build process and support for modern OS has a lower success rate than the original tool.
|
||||
|
@ -135,7 +133,7 @@ For the tools that we could run, #resultratio of analyses are finishing successf
|
|||
supplement: none,
|
||||
kind: "sub-rasta-exit-evolution"
|
||||
) <fig:rasta-exit-evolution-not-java>]
|
||||
), caption: [Exit status evolution for the Rasta dataset]
|
||||
), caption: [Exit status evolution for the RASTA dataset]
|
||||
) <fig:rasta-exit-evolution>
|
||||
|
||||
For investigating the effect of application dates on the tools, we computed the date of each #APK based on the minimum date between the first upload in AndroZoo and the first analysis in VirusTotal.
|
||||
|
@ -293,7 +291,7 @@ The date is also correlated with the success rate for Java-based tools only.
|
|||
table.hline(),
|
||||
table.header(
|
||||
table.cell(colspan: 3/*4*/, inset: 3pt)[],
|
||||
table.cell(rowspan:2)[*Rasta part*],
|
||||
table.cell(rowspan:2)[*RASTA part*],
|
||||
table.vline(end: 3),
|
||||
table.vline(start: 4),
|
||||
table.cell(colspan:2)[*Average size* (MB)],
|
||||
|
@ -358,7 +356,7 @@ sqlite> SELECT vt_detection == 0, COUNT(DISTINCT sha256) FROM apk WHERE dex_size
|
|||
width: 100%,
|
||||
alt: "Bar chart showing the % of analyse apk on the y-axis and the tools on the x-axis.
|
||||
Each tools has two bars, one for goodware an one for malware.
|
||||
The goodware bars are the same as the one in the figure Exit status for the Rasta dataset.
|
||||
The goodware bars are the same as the one in the figure Exit status for the RASTA dataset.
|
||||
The timeout rate looks the same on both bar of each tools.
|
||||
The finishing rate of the malware bar is a lot higher than in the goodware bar for androguard_dad, blueseal, didfail, iccta, perfchecker and wogsen_et_al.
|
||||
The finishing rate of the malware bar is higher than in the goodware bar for ic3 and ic3_fork.
|
||||
|
@ -366,7 +364,7 @@ sqlite> SELECT vt_detection == 0, COUNT(DISTINCT sha256) FROM apk WHERE dex_size
|
|||
The other tools have similar finishing rate, finishing rate slightly in favor of malware.
|
||||
"
|
||||
),
|
||||
caption: [Exit status comparing goodware (left bars) and malware (right bars) for the Rasta dataset],
|
||||
caption: [Exit status comparing goodware (left bars) and malware (right bars) for the RASTA dataset],
|
||||
) <fig:rasta-exit-goodmal>
|
||||
|
||||
/*
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue