more malware vs goodware discution
This commit is contained in:
parent
af1187f041
commit
02be146060
1 changed files with 146 additions and 79 deletions
|
@ -1,4 +1,4 @@
|
||||||
#import "../lib.typ": todo, highlight, num, paragraph, SDK, APK, DEX, FR
|
#import "../lib.typ": todo, highlight, num, paragraph, SDK, APK, DEX, FR, APKs
|
||||||
#import "X_var.typ": *
|
#import "X_var.typ": *
|
||||||
#import "X_lib.typ": *
|
#import "X_lib.typ": *
|
||||||
|
|
||||||
|
@ -8,14 +8,69 @@
|
||||||
=== RQ1: Re-Usability Evaluation
|
=== RQ1: Re-Usability Evaluation
|
||||||
|
|
||||||
|
|
||||||
#todo[alt text for figure rasta-exit / rasta-exit-drebin]
|
|
||||||
#figure(
|
#figure(
|
||||||
image("figs/exit-status-for-the-drebin-dataset.svg", width: 100%),
|
image(
|
||||||
|
"figs/exit-status-for-the-drebin-dataset.svg",
|
||||||
|
width: 100%,
|
||||||
|
alt: "Bar chart showing the % of analyse apk on the y-axis and the tools on the x-axis.
|
||||||
|
Horizontal blue dotted lines mark the 15%, 50% % and 85% values.
|
||||||
|
Each bar represent a tools, with the finished analysis in green at the bottom, the analysis that timed of in blue, then on top in red the analysis that failed. Their is a last color, grey, for the other category, only visible in the dialdroid bar representing 5% of the result.
|
||||||
|
The results are (approximately) as follow:
|
||||||
|
adagio: 100% finished
|
||||||
|
amandroid: less than 5% timed out, the rest finished
|
||||||
|
anadroid: 85% failed, less than 5% timed out, the rest finished
|
||||||
|
androguard: 100% finished
|
||||||
|
androguard_dad: 5% failled, the rest finished
|
||||||
|
apparecium: arround 1% failed, the rest finished
|
||||||
|
blueseal: less than 5 failed, a little more than 10% timed out, the rest (just under 85%) finished
|
||||||
|
dialdroid: a little more than 50% finished, less than 5% timed out, arround 5% are marked as other, the rest failled
|
||||||
|
didfail: 70% finished, the rest failed
|
||||||
|
droidsafe: 40% finihed, 45% timedout, 15% failed
|
||||||
|
flowdroid: 65% finished, the rest failed
|
||||||
|
gator: 100% finished
|
||||||
|
ic3: 99% finished, 1% failed
|
||||||
|
ic3_fork: 98% finishe, 2% failed
|
||||||
|
iccta: 60% finished, less than 5% timed out, the rest failed
|
||||||
|
mallodroid: 100% finished
|
||||||
|
perfchecker: 75% finished, the rest failed
|
||||||
|
redexer: 100% finished
|
||||||
|
saaf: 90% finished, 5% timed out, 5% failed,
|
||||||
|
wognsen_et_al: 75% finished, 1% failed, the rest timed out
|
||||||
|
"
|
||||||
|
),
|
||||||
caption: [Exit status for the Drebin dataset],
|
caption: [Exit status for the Drebin dataset],
|
||||||
) <fig:rasta-exit-drebin>
|
) <fig:rasta-exit-drebin>
|
||||||
|
|
||||||
#figure(
|
#figure(
|
||||||
image("figs/exit-status-for-the-rasta-dataset.svg", width: 100%),
|
image(
|
||||||
|
"figs/exit-status-for-the-rasta-dataset.svg",
|
||||||
|
width: 100%,
|
||||||
|
alt: "Bar chart showing the % of analyse apk on the y-axis and the tools on the x-axis.
|
||||||
|
Horizontal blue dotted lines mark the 15%, 50% % and 85% values.
|
||||||
|
Each bar represent a tools, with the finished analysis in green at the bottom, the analysis that timed of in blue, then on top in red the analysis that failed. Their is a last color, grey, for the other category, only visible in the dialdroid bar representing 10% of the result and in the blueseal bar, for 5% of the results.
|
||||||
|
The results are (approximately) as follow:
|
||||||
|
adagio: 100% finished
|
||||||
|
amandroid: less than 5% failed, 10% timed out, the rest finished
|
||||||
|
anadroid: 95% failed, 1% timed out, the rest finished
|
||||||
|
androguard: 100% finished
|
||||||
|
androguard_dad: a little more than 45% finished, the rest failed
|
||||||
|
apparecium: arround 5% failed, 1% timed out, the rest finished
|
||||||
|
blueseal: 20% finished, a 15% timed out, 5% are marked other, the rest failed
|
||||||
|
dialdroid: 35% finished, 1% timed out, 10 are marked other, the rest failed
|
||||||
|
didfail: 25% finished, less than 5% timed out, the rest failed
|
||||||
|
droidsafe: less than 10% finihed, 20% timedout, the rest failed
|
||||||
|
flowdroid: 55% finished, the rest failed
|
||||||
|
gator: a little more than 85% finished, 5% timed out, 10% failed
|
||||||
|
ic3: less than 80% finished, 5% timed out, the rest failed
|
||||||
|
ic3_fork: 60% finished, 5% times out, the rest failed
|
||||||
|
iccta: 30% finished, 10% timed out, the rest failed
|
||||||
|
mallodroid: 100% finished
|
||||||
|
perfchecker: 25% finished, less than 5% timed out, the rest failed
|
||||||
|
redexer: 90% finished, the rest failed
|
||||||
|
saaf: 40% finished, the rest failed,
|
||||||
|
wognsen_et_al: a little less than 15% finished, a little less than 20% failed, the rest timed out
|
||||||
|
"
|
||||||
|
),
|
||||||
caption: [Exit status for the Rasta dataset],
|
caption: [Exit status for the Rasta dataset],
|
||||||
) <fig:rasta-exit>
|
) <fig:rasta-exit>
|
||||||
|
|
||||||
|
@ -218,75 +273,6 @@ The date is also correlated with the success rate for Java based tools only.
|
||||||
|
|
||||||
=== RQ3: Malware vs Goodware <sec:rasta-mal-vs-good>
|
=== RQ3: Malware vs Goodware <sec:rasta-mal-vs-good>
|
||||||
|
|
||||||
#todo[complete @sec:rasta-mal-vs-good by commenting the new figures]
|
|
||||||
|
|
||||||
/*
|
|
||||||
```
|
|
||||||
sqlite> SELECT vt_detection == 0, COUNT(exec.sha256) FROM exec INNER JOIN apk ON exec.sha256 = apk.sha256 WHERE tool_status = 'FINISHED' AND dex_size_decile = 6 GROUP BY vt_detection == 0;
|
|
||||||
0|2971 % malware
|
|
||||||
1|60455 % goodware
|
|
||||||
sqlite> SELECT vt_detection == 0, COUNT(DISTINCT sha256) FROM apk WHERE dex_size_decile = 6 GROUP BY vt_detection == 0;
|
|
||||||
0|243
|
|
||||||
1|6009
|
|
||||||
```
|
|
||||||
```
|
|
||||||
>>> 61.13168724279835
|
|
||||||
0.4969812257050699
|
|
||||||
>>> 60455/6009/20 * 100
|
|
||||||
50.30371110001665
|
|
||||||
```
|
|
||||||
|
|
||||||
rate goodware rate malware avg size goodware (MB) avg size malware (MB)
|
|
||||||
decile 1: 85.42 82.02 0.13 0.11
|
|
||||||
decile 2: 74.46 72.34 0.54 0.55
|
|
||||||
decile 3: 63.38 65.67 1.37 1.25
|
|
||||||
decile 4: 57.21 62.31 2.41 2.34
|
|
||||||
decile 5: 53.36 59.27 3.56 3.55
|
|
||||||
decile 6: 50.3 61.13 4.61 4.56
|
|
||||||
decile 7: 46.76 56.54 5.87 5.91
|
|
||||||
decile 8: 42.57 56.23 7.64 7.63
|
|
||||||
decile 9: 39.09 57.94 11.39 11.26
|
|
||||||
decile 10: 33.34 45.86 24.24 21.36
|
|
||||||
total: 54.28 64.82 6.29 4.14
|
|
||||||
*/
|
|
||||||
|
|
||||||
|
|
||||||
#todo[Alt text for rasta-exit-goodmal]
|
|
||||||
#figure(
|
|
||||||
image(
|
|
||||||
"figs/exit-status-for-the-rasta-dataset-goodware-malware.svg",
|
|
||||||
width: 100%,
|
|
||||||
alt: "",
|
|
||||||
),
|
|
||||||
caption: [Exit status comparing goodware and malware for the Rasta dataset],
|
|
||||||
) <fig:rasta-exit-goodmal>
|
|
||||||
|
|
||||||
/*
|
|
||||||
[15:25] Jean-Marie Mineau
|
|
||||||
|
|
||||||
moyenne de la taille total des dex: 6464228.10027989
|
|
||||||
|
|
||||||
[15:26] Jean-Marie Mineau
|
|
||||||
|
|
||||||
(tout confondu)
|
|
||||||
|
|
||||||
[15:26] Jean-Marie Mineau
|
|
||||||
|
|
||||||
goodware: 6598464.94224066
|
|
||||||
|
|
||||||
malware: 4337376.97252155
|
|
||||||
|
|
||||||
```
|
|
||||||
sqlite> SELECT AVG(apk_size) FROM apk;
|
|
||||||
16918107.6526989
|
|
||||||
sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection = 0;
|
|
||||||
16897989.4472311
|
|
||||||
sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
|
|
||||||
17236860.8903556
|
|
||||||
```
|
|
||||||
*/
|
|
||||||
|
|
||||||
|
|
||||||
#figure({
|
#figure({
|
||||||
show table: set text(size: 0.80em)
|
show table: set text(size: 0.80em)
|
||||||
table(
|
table(
|
||||||
|
@ -318,9 +304,91 @@ sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
|
||||||
table.cell(colspan: 3/*4*/, inset: 3pt)[],
|
table.cell(colspan: 3/*4*/, inset: 3pt)[],
|
||||||
table.hline(),
|
table.hline(),
|
||||||
)},
|
)},
|
||||||
|
placement: none, // floating figure makes this table go in the previous section :grim:
|
||||||
caption: [Average size and date of goodware/malware parts of the Rasta dataset],
|
caption: [Average size and date of goodware/malware parts of the Rasta dataset],
|
||||||
) <tab:rasta-sizes>
|
) <tab:rasta-sizes>
|
||||||
|
|
||||||
|
We sampled our dataset to have a variety of #APK sizes, but the size of the application is not entirely proportional to the bytecode size.
|
||||||
|
Looking at @tab:rasta-sizes, we can see that although malware are in average bigger #APKs, they contains less bytecode than goodware.
|
||||||
|
In the previous section, we saw that the size of the bytecode has the most significant impact on the finishing rate of analysis tools, and indeed, @fig:rasta-exit-goodmal reflect that.
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
```
|
||||||
|
sqlite> SELECT vt_detection == 0, COUNT(exec.sha256) FROM exec INNER JOIN apk ON exec.sha256 = apk.sha256 WHERE tool_status = 'FINISHED' AND dex_size_decile = 6 GROUP BY vt_detection == 0;
|
||||||
|
0|2971 % malware
|
||||||
|
1|60455 % goodware
|
||||||
|
sqlite> SELECT vt_detection == 0, COUNT(DISTINCT sha256) FROM apk WHERE dex_size_decile = 6 GROUP BY vt_detection == 0;
|
||||||
|
0|243
|
||||||
|
1|6009
|
||||||
|
```
|
||||||
|
```
|
||||||
|
>>> 61.13168724279835
|
||||||
|
0.4969812257050699
|
||||||
|
>>> 60455/6009/20 * 100
|
||||||
|
50.30371110001665
|
||||||
|
```
|
||||||
|
|
||||||
|
rate goodware rate malware avg size goodware (MB) avg size malware (MB)
|
||||||
|
decile 1: 85.42 82.02 0.13 0.11
|
||||||
|
decile 2: 74.46 72.34 0.54 0.55
|
||||||
|
decile 3: 63.38 65.67 1.37 1.25
|
||||||
|
decile 4: 57.21 62.31 2.41 2.34
|
||||||
|
decile 5: 53.36 59.27 3.56 3.55
|
||||||
|
decile 6: 50.3 61.13 4.61 4.56
|
||||||
|
decile 7: 46.76 56.54 5.87 5.91
|
||||||
|
decile 8: 42.57 56.23 7.64 7.63
|
||||||
|
decile 9: 39.09 57.94 11.39 11.26
|
||||||
|
decile 10: 33.34 45.86 24.24 21.36
|
||||||
|
total: 54.28 64.82 6.29 4.14
|
||||||
|
*/
|
||||||
|
|
||||||
|
#figure(
|
||||||
|
image(
|
||||||
|
"figs/exit-status-for-the-rasta-dataset-goodware-malware.svg",
|
||||||
|
width: 100%,
|
||||||
|
alt: "Bar chart showing the % of analyse apk on the y-axis and the tools on the x-axis.
|
||||||
|
Each tools has two bars, one for goodware an one for malware.
|
||||||
|
The goodware bars are the same as the one in the figure Exit status for the Rasta dataset.
|
||||||
|
The timeout rate looks the same on both bar of each tools.
|
||||||
|
The finishing rate of the malware bar is a lot higher than in the goodware bar for androguard_dad, blueseal, didfail, iccta, perfchecker and wogsen_et_al.
|
||||||
|
The finishing rate of the malware bar is higher than in the goodware bar for ic3 and ic3_fork.
|
||||||
|
The only two tools where the finishing rate is better for goodware are apparecium (by arround 15%) and redexer (by arround 10%).
|
||||||
|
The other tools have similar finishing rate, finishing rate slightly in favor of malware.
|
||||||
|
"
|
||||||
|
),
|
||||||
|
caption: [Exit status comparing goodware (left bars) and malware (right bars) for the Rasta dataset],
|
||||||
|
) <fig:rasta-exit-goodmal>
|
||||||
|
|
||||||
|
/*
|
||||||
|
[15:25] Jean-Marie Mineau
|
||||||
|
|
||||||
|
moyenne de la taille total des dex: 6464228.10027989
|
||||||
|
|
||||||
|
[15:26] Jean-Marie Mineau
|
||||||
|
|
||||||
|
(tout confondu)
|
||||||
|
|
||||||
|
[15:26] Jean-Marie Mineau
|
||||||
|
|
||||||
|
goodware: 6598464.94224066
|
||||||
|
|
||||||
|
malware: 4337376.97252155
|
||||||
|
|
||||||
|
```
|
||||||
|
sqlite> SELECT AVG(apk_size) FROM apk;
|
||||||
|
16918107.6526989
|
||||||
|
sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection = 0;
|
||||||
|
16897989.4472311
|
||||||
|
sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
|
||||||
|
17236860.8903556
|
||||||
|
```
|
||||||
|
*/
|
||||||
|
|
||||||
|
In @fig:rasta-exit-goodmal, we compared the finishing rate of malware and goodware applications for the evaluated tools.
|
||||||
|
We can see that malware and goodware seam to generate a similar number of timeouts.
|
||||||
|
However, with the exception of two tools -- apparecium and redexer, we can see a trend of goodware beeing harder to analyse than malware.
|
||||||
|
Some tools, like DAD or perfchecker, show the finishing rate ratio augment by more than 20 points.
|
||||||
|
|
||||||
#figure({
|
#figure({
|
||||||
show table: set text(size: 0.80em)
|
show table: set text(size: 0.80em)
|
||||||
|
@ -369,13 +437,12 @@ sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
|
||||||
)},
|
)},
|
||||||
caption: [#DEX size and Finishing Rate (#FR) per decile],
|
caption: [#DEX size and Finishing Rate (#FR) per decile],
|
||||||
) <tab:rasta-sizes-decile>
|
) <tab:rasta-sizes-decile>
|
||||||
|
We saw the the bytecode size may be an explanation for this increase.
|
||||||
We compared the finishing rate of malware and goodware applications for evaluated tools.
|
To investigate this further, @tab:rasta-sizes-decile reports the bytecode size and the finishing rate of goodware and malware in each decile of bytecode size.
|
||||||
Because, the size of applications impacts this finishing rate, it is interesting to compare the success rate for each decile of bytecode size.
|
|
||||||
@tab:rasta-sizes-decile reports the bytecode size and the finishing rate of goodware and malware in each decile of size.
|
|
||||||
We also computed the ratio of the bytecode size and finishing rate for the two populations.
|
We also computed the ratio of the bytecode size and finishing rate for the two populations.
|
||||||
We observe that the ratio for the finishing rate decreases from 1.04 to 0.73, while the ratio of the bytecode size is around 1.
|
We observe that the while the bytecode size ratio between goodware an malware stays close to one in each deciles (excluding the two extremes), the goodware/malware finishing rate ratio decrease for each decile.
|
||||||
We conclude from this table that analyzing malware triggers less errors than for goodware.
|
It goes from 1.03 for the 2#super[nd] decile to 0.67 in the 9#super[th] decile.
|
||||||
|
We conclude from this table that, at equal size, analyzing malware still triggers less errors than for goodware, and that the difference of errors generated between when analyzing a goodware and analyzing a malware increase with the bytecode size.
|
||||||
|
|
||||||
|
|
||||||
#highlight()[
|
#highlight()[
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue