more malware vs goodware discution
This commit is contained in:
parent
af1187f041
commit
02be146060
1 changed files with 146 additions and 79 deletions
|
@ -1,4 +1,4 @@
|
|||
#import "../lib.typ": todo, highlight, num, paragraph, SDK, APK, DEX, FR
|
||||
#import "../lib.typ": todo, highlight, num, paragraph, SDK, APK, DEX, FR, APKs
|
||||
#import "X_var.typ": *
|
||||
#import "X_lib.typ": *
|
||||
|
||||
|
@ -8,14 +8,69 @@
|
|||
=== RQ1: Re-Usability Evaluation
|
||||
|
||||
|
||||
#todo[alt text for figure rasta-exit / rasta-exit-drebin]
|
||||
#figure(
|
||||
image("figs/exit-status-for-the-drebin-dataset.svg", width: 100%),
|
||||
image(
|
||||
"figs/exit-status-for-the-drebin-dataset.svg",
|
||||
width: 100%,
|
||||
alt: "Bar chart showing the % of analyse apk on the y-axis and the tools on the x-axis.
|
||||
Horizontal blue dotted lines mark the 15%, 50% % and 85% values.
|
||||
Each bar represent a tools, with the finished analysis in green at the bottom, the analysis that timed of in blue, then on top in red the analysis that failed. Their is a last color, grey, for the other category, only visible in the dialdroid bar representing 5% of the result.
|
||||
The results are (approximately) as follow:
|
||||
adagio: 100% finished
|
||||
amandroid: less than 5% timed out, the rest finished
|
||||
anadroid: 85% failed, less than 5% timed out, the rest finished
|
||||
androguard: 100% finished
|
||||
androguard_dad: 5% failled, the rest finished
|
||||
apparecium: arround 1% failed, the rest finished
|
||||
blueseal: less than 5 failed, a little more than 10% timed out, the rest (just under 85%) finished
|
||||
dialdroid: a little more than 50% finished, less than 5% timed out, arround 5% are marked as other, the rest failled
|
||||
didfail: 70% finished, the rest failed
|
||||
droidsafe: 40% finihed, 45% timedout, 15% failed
|
||||
flowdroid: 65% finished, the rest failed
|
||||
gator: 100% finished
|
||||
ic3: 99% finished, 1% failed
|
||||
ic3_fork: 98% finishe, 2% failed
|
||||
iccta: 60% finished, less than 5% timed out, the rest failed
|
||||
mallodroid: 100% finished
|
||||
perfchecker: 75% finished, the rest failed
|
||||
redexer: 100% finished
|
||||
saaf: 90% finished, 5% timed out, 5% failed,
|
||||
wognsen_et_al: 75% finished, 1% failed, the rest timed out
|
||||
"
|
||||
),
|
||||
caption: [Exit status for the Drebin dataset],
|
||||
) <fig:rasta-exit-drebin>
|
||||
|
||||
#figure(
|
||||
image("figs/exit-status-for-the-rasta-dataset.svg", width: 100%),
|
||||
image(
|
||||
"figs/exit-status-for-the-rasta-dataset.svg",
|
||||
width: 100%,
|
||||
alt: "Bar chart showing the % of analyse apk on the y-axis and the tools on the x-axis.
|
||||
Horizontal blue dotted lines mark the 15%, 50% % and 85% values.
|
||||
Each bar represent a tools, with the finished analysis in green at the bottom, the analysis that timed of in blue, then on top in red the analysis that failed. Their is a last color, grey, for the other category, only visible in the dialdroid bar representing 10% of the result and in the blueseal bar, for 5% of the results.
|
||||
The results are (approximately) as follow:
|
||||
adagio: 100% finished
|
||||
amandroid: less than 5% failed, 10% timed out, the rest finished
|
||||
anadroid: 95% failed, 1% timed out, the rest finished
|
||||
androguard: 100% finished
|
||||
androguard_dad: a little more than 45% finished, the rest failed
|
||||
apparecium: arround 5% failed, 1% timed out, the rest finished
|
||||
blueseal: 20% finished, a 15% timed out, 5% are marked other, the rest failed
|
||||
dialdroid: 35% finished, 1% timed out, 10 are marked other, the rest failed
|
||||
didfail: 25% finished, less than 5% timed out, the rest failed
|
||||
droidsafe: less than 10% finihed, 20% timedout, the rest failed
|
||||
flowdroid: 55% finished, the rest failed
|
||||
gator: a little more than 85% finished, 5% timed out, 10% failed
|
||||
ic3: less than 80% finished, 5% timed out, the rest failed
|
||||
ic3_fork: 60% finished, 5% times out, the rest failed
|
||||
iccta: 30% finished, 10% timed out, the rest failed
|
||||
mallodroid: 100% finished
|
||||
perfchecker: 25% finished, less than 5% timed out, the rest failed
|
||||
redexer: 90% finished, the rest failed
|
||||
saaf: 40% finished, the rest failed,
|
||||
wognsen_et_al: a little less than 15% finished, a little less than 20% failed, the rest timed out
|
||||
"
|
||||
),
|
||||
caption: [Exit status for the Rasta dataset],
|
||||
) <fig:rasta-exit>
|
||||
|
||||
|
@ -218,75 +273,6 @@ The date is also correlated with the success rate for Java based tools only.
|
|||
|
||||
=== RQ3: Malware vs Goodware <sec:rasta-mal-vs-good>
|
||||
|
||||
#todo[complete @sec:rasta-mal-vs-good by commenting the new figures]
|
||||
|
||||
/*
|
||||
```
|
||||
sqlite> SELECT vt_detection == 0, COUNT(exec.sha256) FROM exec INNER JOIN apk ON exec.sha256 = apk.sha256 WHERE tool_status = 'FINISHED' AND dex_size_decile = 6 GROUP BY vt_detection == 0;
|
||||
0|2971 % malware
|
||||
1|60455 % goodware
|
||||
sqlite> SELECT vt_detection == 0, COUNT(DISTINCT sha256) FROM apk WHERE dex_size_decile = 6 GROUP BY vt_detection == 0;
|
||||
0|243
|
||||
1|6009
|
||||
```
|
||||
```
|
||||
>>> 61.13168724279835
|
||||
0.4969812257050699
|
||||
>>> 60455/6009/20 * 100
|
||||
50.30371110001665
|
||||
```
|
||||
|
||||
rate goodware rate malware avg size goodware (MB) avg size malware (MB)
|
||||
decile 1: 85.42 82.02 0.13 0.11
|
||||
decile 2: 74.46 72.34 0.54 0.55
|
||||
decile 3: 63.38 65.67 1.37 1.25
|
||||
decile 4: 57.21 62.31 2.41 2.34
|
||||
decile 5: 53.36 59.27 3.56 3.55
|
||||
decile 6: 50.3 61.13 4.61 4.56
|
||||
decile 7: 46.76 56.54 5.87 5.91
|
||||
decile 8: 42.57 56.23 7.64 7.63
|
||||
decile 9: 39.09 57.94 11.39 11.26
|
||||
decile 10: 33.34 45.86 24.24 21.36
|
||||
total: 54.28 64.82 6.29 4.14
|
||||
*/
|
||||
|
||||
|
||||
#todo[Alt text for rasta-exit-goodmal]
|
||||
#figure(
|
||||
image(
|
||||
"figs/exit-status-for-the-rasta-dataset-goodware-malware.svg",
|
||||
width: 100%,
|
||||
alt: "",
|
||||
),
|
||||
caption: [Exit status comparing goodware and malware for the Rasta dataset],
|
||||
) <fig:rasta-exit-goodmal>
|
||||
|
||||
/*
|
||||
[15:25] Jean-Marie Mineau
|
||||
|
||||
moyenne de la taille total des dex: 6464228.10027989
|
||||
|
||||
[15:26] Jean-Marie Mineau
|
||||
|
||||
(tout confondu)
|
||||
|
||||
[15:26] Jean-Marie Mineau
|
||||
|
||||
goodware: 6598464.94224066
|
||||
|
||||
malware: 4337376.97252155
|
||||
|
||||
```
|
||||
sqlite> SELECT AVG(apk_size) FROM apk;
|
||||
16918107.6526989
|
||||
sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection = 0;
|
||||
16897989.4472311
|
||||
sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
|
||||
17236860.8903556
|
||||
```
|
||||
*/
|
||||
|
||||
|
||||
#figure({
|
||||
show table: set text(size: 0.80em)
|
||||
table(
|
||||
|
@ -318,9 +304,91 @@ sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
|
|||
table.cell(colspan: 3/*4*/, inset: 3pt)[],
|
||||
table.hline(),
|
||||
)},
|
||||
placement: none, // floating figure makes this table go in the previous section :grim:
|
||||
caption: [Average size and date of goodware/malware parts of the Rasta dataset],
|
||||
) <tab:rasta-sizes>
|
||||
|
||||
We sampled our dataset to have a variety of #APK sizes, but the size of the application is not entirely proportional to the bytecode size.
|
||||
Looking at @tab:rasta-sizes, we can see that although malware are in average bigger #APKs, they contains less bytecode than goodware.
|
||||
In the previous section, we saw that the size of the bytecode has the most significant impact on the finishing rate of analysis tools, and indeed, @fig:rasta-exit-goodmal reflect that.
|
||||
|
||||
|
||||
/*
|
||||
```
|
||||
sqlite> SELECT vt_detection == 0, COUNT(exec.sha256) FROM exec INNER JOIN apk ON exec.sha256 = apk.sha256 WHERE tool_status = 'FINISHED' AND dex_size_decile = 6 GROUP BY vt_detection == 0;
|
||||
0|2971 % malware
|
||||
1|60455 % goodware
|
||||
sqlite> SELECT vt_detection == 0, COUNT(DISTINCT sha256) FROM apk WHERE dex_size_decile = 6 GROUP BY vt_detection == 0;
|
||||
0|243
|
||||
1|6009
|
||||
```
|
||||
```
|
||||
>>> 61.13168724279835
|
||||
0.4969812257050699
|
||||
>>> 60455/6009/20 * 100
|
||||
50.30371110001665
|
||||
```
|
||||
|
||||
rate goodware rate malware avg size goodware (MB) avg size malware (MB)
|
||||
decile 1: 85.42 82.02 0.13 0.11
|
||||
decile 2: 74.46 72.34 0.54 0.55
|
||||
decile 3: 63.38 65.67 1.37 1.25
|
||||
decile 4: 57.21 62.31 2.41 2.34
|
||||
decile 5: 53.36 59.27 3.56 3.55
|
||||
decile 6: 50.3 61.13 4.61 4.56
|
||||
decile 7: 46.76 56.54 5.87 5.91
|
||||
decile 8: 42.57 56.23 7.64 7.63
|
||||
decile 9: 39.09 57.94 11.39 11.26
|
||||
decile 10: 33.34 45.86 24.24 21.36
|
||||
total: 54.28 64.82 6.29 4.14
|
||||
*/
|
||||
|
||||
#figure(
|
||||
image(
|
||||
"figs/exit-status-for-the-rasta-dataset-goodware-malware.svg",
|
||||
width: 100%,
|
||||
alt: "Bar chart showing the % of analyse apk on the y-axis and the tools on the x-axis.
|
||||
Each tools has two bars, one for goodware an one for malware.
|
||||
The goodware bars are the same as the one in the figure Exit status for the Rasta dataset.
|
||||
The timeout rate looks the same on both bar of each tools.
|
||||
The finishing rate of the malware bar is a lot higher than in the goodware bar for androguard_dad, blueseal, didfail, iccta, perfchecker and wogsen_et_al.
|
||||
The finishing rate of the malware bar is higher than in the goodware bar for ic3 and ic3_fork.
|
||||
The only two tools where the finishing rate is better for goodware are apparecium (by arround 15%) and redexer (by arround 10%).
|
||||
The other tools have similar finishing rate, finishing rate slightly in favor of malware.
|
||||
"
|
||||
),
|
||||
caption: [Exit status comparing goodware (left bars) and malware (right bars) for the Rasta dataset],
|
||||
) <fig:rasta-exit-goodmal>
|
||||
|
||||
/*
|
||||
[15:25] Jean-Marie Mineau
|
||||
|
||||
moyenne de la taille total des dex: 6464228.10027989
|
||||
|
||||
[15:26] Jean-Marie Mineau
|
||||
|
||||
(tout confondu)
|
||||
|
||||
[15:26] Jean-Marie Mineau
|
||||
|
||||
goodware: 6598464.94224066
|
||||
|
||||
malware: 4337376.97252155
|
||||
|
||||
```
|
||||
sqlite> SELECT AVG(apk_size) FROM apk;
|
||||
16918107.6526989
|
||||
sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection = 0;
|
||||
16897989.4472311
|
||||
sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
|
||||
17236860.8903556
|
||||
```
|
||||
*/
|
||||
|
||||
In @fig:rasta-exit-goodmal, we compared the finishing rate of malware and goodware applications for the evaluated tools.
|
||||
We can see that malware and goodware seam to generate a similar number of timeouts.
|
||||
However, with the exception of two tools -- apparecium and redexer, we can see a trend of goodware beeing harder to analyse than malware.
|
||||
Some tools, like DAD or perfchecker, show the finishing rate ratio augment by more than 20 points.
|
||||
|
||||
#figure({
|
||||
show table: set text(size: 0.80em)
|
||||
|
@ -369,13 +437,12 @@ sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
|
|||
)},
|
||||
caption: [#DEX size and Finishing Rate (#FR) per decile],
|
||||
) <tab:rasta-sizes-decile>
|
||||
|
||||
We compared the finishing rate of malware and goodware applications for evaluated tools.
|
||||
Because, the size of applications impacts this finishing rate, it is interesting to compare the success rate for each decile of bytecode size.
|
||||
@tab:rasta-sizes-decile reports the bytecode size and the finishing rate of goodware and malware in each decile of size.
|
||||
We saw the the bytecode size may be an explanation for this increase.
|
||||
To investigate this further, @tab:rasta-sizes-decile reports the bytecode size and the finishing rate of goodware and malware in each decile of bytecode size.
|
||||
We also computed the ratio of the bytecode size and finishing rate for the two populations.
|
||||
We observe that the ratio for the finishing rate decreases from 1.04 to 0.73, while the ratio of the bytecode size is around 1.
|
||||
We conclude from this table that analyzing malware triggers less errors than for goodware.
|
||||
We observe that the while the bytecode size ratio between goodware an malware stays close to one in each deciles (excluding the two extremes), the goodware/malware finishing rate ratio decrease for each decile.
|
||||
It goes from 1.03 for the 2#super[nd] decile to 0.67 in the 9#super[th] decile.
|
||||
We conclude from this table that, at equal size, analyzing malware still triggers less errors than for goodware, and that the difference of errors generated between when analyzing a goodware and analyzing a malware increase with the bytecode size.
|
||||
|
||||
|
||||
#highlight()[
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue