452 lines
22 KiB
Typst
452 lines
22 KiB
Typst
#import "../lib.typ": todo, highlight, num, paragraph, SDK, APK, DEX, FR, APKs
|
|
#import "X_var.typ": *
|
|
#import "X_lib.typ": *
|
|
|
|
== Experiments <sec:rasta-xp>
|
|
|
|
|
|
=== RQ1: Re-Usability Evaluation
|
|
|
|
|
|
#figure(
|
|
image(
|
|
"figs/exit-status-for-the-drebin-dataset.svg",
|
|
width: 100%,
|
|
alt: "Bar chart showing the % of analyse apk on the y-axis and the tools on the x-axis.
|
|
Horizontal blue dotted lines mark the 15%, 50% % and 85% values.
|
|
Each bar represent a tools, with the finished analysis in green at the bottom, the analysis that timed of in blue, then on top in red the analysis that failed. Their is a last color, grey, for the other category, only visible in the dialdroid bar representing 5% of the result.
|
|
The results are (approximately) as follow:
|
|
adagio: 100% finished
|
|
amandroid: less than 5% timed out, the rest finished
|
|
anadroid: 85% failed, less than 5% timed out, the rest finished
|
|
androguard: 100% finished
|
|
androguard_dad: 5% failled, the rest finished
|
|
apparecium: arround 1% failed, the rest finished
|
|
blueseal: less than 5 failed, a little more than 10% timed out, the rest (just under 85%) finished
|
|
dialdroid: a little more than 50% finished, less than 5% timed out, arround 5% are marked as other, the rest failled
|
|
didfail: 70% finished, the rest failed
|
|
droidsafe: 40% finihed, 45% timedout, 15% failed
|
|
flowdroid: 65% finished, the rest failed
|
|
gator: 100% finished
|
|
ic3: 99% finished, 1% failed
|
|
ic3_fork: 98% finishe, 2% failed
|
|
iccta: 60% finished, less than 5% timed out, the rest failed
|
|
mallodroid: 100% finished
|
|
perfchecker: 75% finished, the rest failed
|
|
redexer: 100% finished
|
|
saaf: 90% finished, 5% timed out, 5% failed,
|
|
wognsen_et_al: 75% finished, 1% failed, the rest timed out
|
|
"
|
|
),
|
|
caption: [Exit status for the Drebin dataset],
|
|
) <fig:rasta-exit-drebin>
|
|
|
|
#figure(
|
|
image(
|
|
"figs/exit-status-for-the-rasta-dataset.svg",
|
|
width: 100%,
|
|
alt: "Bar chart showing the % of analyse apk on the y-axis and the tools on the x-axis.
|
|
Horizontal blue dotted lines mark the 15%, 50% % and 85% values.
|
|
Each bar represent a tools, with the finished analysis in green at the bottom, the analysis that timed of in blue, then on top in red the analysis that failed. Their is a last color, grey, for the other category, only visible in the dialdroid bar representing 10% of the result and in the blueseal bar, for 5% of the results.
|
|
The results are (approximately) as follow:
|
|
adagio: 100% finished
|
|
amandroid: less than 5% failed, 10% timed out, the rest finished
|
|
anadroid: 95% failed, 1% timed out, the rest finished
|
|
androguard: 100% finished
|
|
androguard_dad: a little more than 45% finished, the rest failed
|
|
apparecium: arround 5% failed, 1% timed out, the rest finished
|
|
blueseal: 20% finished, a 15% timed out, 5% are marked other, the rest failed
|
|
dialdroid: 35% finished, 1% timed out, 10 are marked other, the rest failed
|
|
didfail: 25% finished, less than 5% timed out, the rest failed
|
|
droidsafe: less than 10% finihed, 20% timedout, the rest failed
|
|
flowdroid: 55% finished, the rest failed
|
|
gator: a little more than 85% finished, 5% timed out, 10% failed
|
|
ic3: less than 80% finished, 5% timed out, the rest failed
|
|
ic3_fork: 60% finished, 5% times out, the rest failed
|
|
iccta: 30% finished, 10% timed out, the rest failed
|
|
mallodroid: 100% finished
|
|
perfchecker: 25% finished, less than 5% timed out, the rest failed
|
|
redexer: 90% finished, the rest failed
|
|
saaf: 40% finished, the rest failed,
|
|
wognsen_et_al: a little less than 15% finished, a little less than 20% failed, the rest timed out
|
|
"
|
|
),
|
|
caption: [Exit status for the Rasta dataset],
|
|
) <fig:rasta-exit>
|
|
|
|
|
|
@fig:rasta-exit-drebin and @fig:rasta-exit compare the Drebin and Rasta datasets.
|
|
They represent the success/failure rate (green/orange) of the tools.
|
|
We distinguished failure to compute a result from timeout (blue) and crashes of our evaluation framework (in grey, probably due to out of memory kills of the container itself).
|
|
Because it may be caused by a bug in our own analysis stack, exit status represented in grey (Other) are considered as unknown errors and not as failure of the tool.
|
|
We discuss further errors for which we have information in the logs in @sec:rasta-failure-analysis.
|
|
|
|
Results on the Drebin datasets shows that 11 tools have a high success rate (greater than 85%).
|
|
The other tools have poor results.
|
|
The worst, excluding Lotrack and Tresher, is Anadroid with a ratio under 20% of success.
|
|
|
|
On the Rasta dataset, we observe a global increase of the number of failed status: #resultunusablenb tools (#resultunusable) have a finishing rate below 50%.
|
|
The tools that have bad results with Drebin are of course bad result on Rasta.
|
|
Three tools (androguard_dad, blueseal, saaf) that were performing well (higher than 85%) on Drebin surprisingly fall below the bar of 50% of failure.
|
|
7 tools keep a high success rate: Adagio, Amandroid, Androguard, Apparecium, Gator, Mallodroid, Redexer.
|
|
Regarding IC3, the fork with a simpler build process and support for modern OS has a lower success rate than the original tool.
|
|
|
|
Two tools should be discussed in particular.
|
|
//Androguard and Flowdroid have a large community of users, as shown by the numbers of GitHub stars in @tab:rasta-sources.
|
|
Androguard has a high success rate which is not surprising: it used by a lot of tools, including for analyzing application uploaded to the Androzoo repository.
|
|
//Because of that, it should be noted that our dataset is biased in favour of Androguard. // Already in discution
|
|
Nevertheless, when using Androguard decompiler (DAD) to decompile an APK, it fails more than 50% of the time.
|
|
This example shows that even a tool that is frequently used can still run into critical failures.
|
|
Concerning Flowdroid, our results show a very low timeout rate (#mypercent(37, NBTOTAL)) which was unexpected: in our exchanges, Flowdroid's author were expecting a higher rate of timeout and fewer crashes.
|
|
|
|
As a summary, the final ratio of successful analysis for the tools that we could run
|
|
// and applications of Rasta dataset
|
|
is #mypercent(54.9, 100).
|
|
When including the two defective tools, this ratio drops to #mypercent(49.9, 100).
|
|
|
|
#highlight()[
|
|
*RQ1 answer:*
|
|
On a recent dataset we consider that #resultunusable of the tools are unusable.
|
|
For the tools that we could run, #resultratio of analysis are finishing successfully.
|
|
//(those with less than 50% of successful execution and including the two tools that we were unable to build).
|
|
]
|
|
|
|
=== RQ2: Size, #SDK and Date Influence
|
|
|
|
#todo[alt text for fig rasta-exit-evolution-java and rasta-exit-evolution-not-java]
|
|
|
|
#figure(stack(dir: ltr,
|
|
[#figure(
|
|
image(
|
|
"figs/finishing-rate-by-year-of-java-based-tools.svg",
|
|
width: 50%,
|
|
alt: ""
|
|
),
|
|
caption: [Java based tools],
|
|
supplement: [Subfigure],
|
|
) <fig:rasta-exit-evolution-java>],
|
|
[#figure(
|
|
image(
|
|
"figs/finishing-rate-by-year-of-non-java-based-tools.svg",
|
|
width: 50%,
|
|
alt: "",
|
|
),
|
|
caption: [Non Java based tools],
|
|
supplement: [Subfigure],
|
|
) <fig:rasta-exit-evolution-not-java>]
|
|
), caption: [Exit status evolution for the Rasta dataset]
|
|
)
|
|
|
|
For investigating the effect of application dates on the tools, we computed the date of each #APK based on the minimum date between the first upload in AndroZoo and the first analysis in VirusTotal.
|
|
Such a computation is more reliable than using the dex date that is often obfuscated when packaging the application.
|
|
Then, for the sake of clarity of our results, we separated the tools that have mainly Java source code from those that use other languages.
|
|
Among the ones that are Java based programs, most of them use the Soot framework which may correlate the obtained results.
|
|
@fig:rasta-exit-evolution-java (resp. @fig:rasta-exit-evolution-not-java) compares the success rate of the tools between 2010 and 2023 for Java based tools (resp. non Java based tools).
|
|
For Java based tools, a clear decrease of finishing rate can be observed globally for all tools.
|
|
For non-Java based tools, 2 of them keep a high success rate (Androguard, Mallodroid).
|
|
The result is expected for Androguard, because the analysis is relatively simple and the tool is largely adopted, as previously mentioned.
|
|
Mallodroid being a relatively simple script leveraging Androgard, it benefit from Androguard resilience.
|
|
It should be noted that Saaf keeps a high success ratio until 2014 and then quickly decreases to less than 20% after 2014. This example shows that, even with an identical source code and the same running platform, a tool can change of behavior among time because of the evolution of the structure of the input files.
|
|
|
|
An interesting comparison is the specific case of Ic3 and Ic3_fork. Until 2019, the success rate is very similar. After 2020, ic3_fork is continuing to decrease whereas Ic3 keeps a success rate of around 60%.
|
|
|
|
/*
|
|
```
|
|
sqlite> SELECT apk1.first_seen_year, (COUNT(*) * 100) / (SELECT 20 * COUNT(*)
|
|
(x1...> FROM apk AS apk2 WHERE apk2.first_seen_year = apk1.first_seen_year
|
|
(x1...> )
|
|
...> FROM exec JOIN apk AS apk1 ON exec.sha256 = apk1.sha256
|
|
...> WHERE exec.tool_status = 'FINISHED' OR exec.tool_status = 'UNKNOWN'
|
|
...> GROUP BY apk1.first_seen_year ORDER BY apk1.first_seen_year;
|
|
2010|78
|
|
2011|78
|
|
2012|76
|
|
2013|70
|
|
2014|66
|
|
2015|61
|
|
2016|57
|
|
2017|54
|
|
2018|49
|
|
2019|47
|
|
2020|45
|
|
2021|42
|
|
2022|40
|
|
2023|39
|
|
```
|
|
*/
|
|
|
|
To compare the influence of the date, #SDK version and size of applications, we fixed one parameter while varying an other.
|
|
|
|
#todo[Alt text for fig rasta-decorelation-size]
|
|
#figure(stack(dir: ltr,
|
|
[#figure(
|
|
image(
|
|
"figs/decorelation/finishing-rate-of-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg",
|
|
width: 50%,
|
|
alt: ""
|
|
),
|
|
caption: [Java based tools],
|
|
supplement: [Subfigure],
|
|
) <fig:rasta-rate-evolution-java-2022>],
|
|
[#figure(
|
|
image(
|
|
"figs/decorelation/finishing-rate-of-non-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg",
|
|
width: 50%,
|
|
alt: "",
|
|
),
|
|
caption: [Non Java based tools],
|
|
supplement: [Subfigure],
|
|
) <fig:rasta-rate-evolution-non-java-2022>]
|
|
), caption: [Finishing rate by bytecode size for APK detected in 2022]
|
|
) <fig:rasta-decorelation-size>
|
|
|
|
#paragraph[Fixed application year. (#num(5000) APKs)][
|
|
We selected the year 2022 which has a good amount of representatives for each decile of size in our application dataset.
|
|
@fig:rasta-rate-evolution-java-2022 (resp. @fig:rasta-rate-evolution-non-java-2022) shows the finishing rate of the tools in function of the size of the bytecode for Java based tools (resp. non Java based tools) analyzing applications of 2022.
|
|
We can observe that all Java based tools have a finishing rate decreasing over years.
|
|
50% of non Java based tools have the same behavior.
|
|
]
|
|
|
|
#todo[Alt text for fig rasta-decorelation-size]
|
|
#figure(stack(dir: ltr,
|
|
[#figure(
|
|
image(
|
|
"figs/decorelation/finishing-rate-of-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg",
|
|
width: 50%,
|
|
alt: ""
|
|
),
|
|
caption: [Java based tools],
|
|
supplement: [Subfigure],
|
|
) <fig:rasta-rate-evolution-java-decile-year>],
|
|
[#figure(
|
|
image(
|
|
"figs/decorelation/finishing-rate-of-non-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg",
|
|
width: 50%,
|
|
alt: "",
|
|
),
|
|
caption: [Non Java based tools],
|
|
supplement: [Subfigure],
|
|
) <fig:rasta-rate-evolution-non-java-decile-year>]
|
|
), caption: [Finishing rate by discovery year with a bytecode size $in$ [4.08, 5.2] MB]
|
|
) <fig:rasta-decorelation-size>
|
|
|
|
#paragraph[Fixed application bytecode size. (#num(6252) APKs)][We selected the sixth decile (between 4.08 and 5.20 MB), which is well represented in a wide number of years.
|
|
@fig:rasta-rate-evolution-java-decile-year (resp. @fig:rasta-rate-evolution-non-java-decile-year) represents the finishing rate depending of the year at a fixed bytecode size.
|
|
We observe that 9 tools over 12 have a finishing rate dropping below 20% for Java based tools, which is not the case for non Java based tools.
|
|
]
|
|
|
|
#todo[Alt text for fig rasta-decorelation-min-sdk]
|
|
#figure(stack(dir: ltr,
|
|
[#figure(
|
|
image(
|
|
"figs/decorelation/finishing-rate-of-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg",
|
|
width: 50%,
|
|
alt: ""
|
|
),
|
|
caption: [Java based tools],
|
|
supplement: [Subfigure],
|
|
) <fig:rasta-rate-evolution-java-decile-min-sdk>],
|
|
[#figure(
|
|
image(
|
|
"figs/decorelation/finishing-rate-of-non-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg",
|
|
width: 50%,
|
|
alt: "",
|
|
),
|
|
caption: [Non Java based tools],
|
|
supplement: [Subfigure],
|
|
) <fig:rasta-rate-evolution-non-java-decile-min-sdk>]
|
|
), caption: [Finishing rate by min #SDK with a bytecode size $in$ [4.08, 5.2] MB]
|
|
) <fig:rasta-decorelation-size>
|
|
|
|
We performed similar experiments by variating the min #SDK and target #SDK versions, still with a fixed bytecode size between 4.08 and 5.2 MB, as shown in @fig:rasta-rate-evolution-java-decile-min-sdk and @fig:rasta-rate-evolution-non-java-decile-min-sdk.
|
|
We found that contrary to the target #SDK, the min #SDK version has an impact on the finishing rate of Java based tools: 8 tools over 12 are below 50% after #SDK 16.
|
|
It is not surprising, as the min #SDK is highly correlated to the year.
|
|
|
|
#highlight(breakable: false)[
|
|
*RQ2 answer:*
|
|
For the #nbtoolsselected tools that can be used partially, a global decrease of the success rate of tools' analysis is observed over time.
|
|
Starting at 78% of success rate, after five years, tools have 61% of success; after ten years, 45% of success.
|
|
The success rate varies based on the size of bytecode and #SDK version.
|
|
The date is also correlated with the success rate for Java based tools only.
|
|
]
|
|
|
|
|
|
=== RQ3: Malware vs Goodware <sec:rasta-mal-vs-good>
|
|
|
|
#figure({
|
|
show table: set text(size: 0.80em)
|
|
table(
|
|
columns: 3, //4,
|
|
inset: (x: 0% + 5pt, y: 0% + 2pt),
|
|
stroke: none,
|
|
align: center+horizon,
|
|
table.hline(),
|
|
table.header(
|
|
table.cell(colspan: 3/*4*/, inset: 3pt)[],
|
|
table.cell(rowspan:2)[*Rasta part*],
|
|
table.vline(end: 3),
|
|
table.vline(start: 4),
|
|
table.cell(colspan:2)[*Average size* (MB)],
|
|
//table.vline(end: 3),
|
|
//table.vline(start: 4),
|
|
//table.cell(rowspan:2)[*Average date*],
|
|
[*APK*],
|
|
[*DEX*],
|
|
),
|
|
table.cell(colspan: 3/*4*/, inset: 3pt)[],
|
|
table.hline(),
|
|
table.cell(colspan: 3/*4*/, inset: 3pt)[],
|
|
|
|
[*goodware*], num(calc.round(16.897989, digits: 1)), num(calc.round(6.598464, digits: 1)),// [2017],
|
|
[*malware*], num(calc.round(17.236860, digits: 1)), num(calc.round(4.337376, digits: 1)),// [2017],
|
|
[*total*], num(calc.round(16.918107, digits: 1)), num(calc.round(6.464228, digits: 1)),// [2017],
|
|
|
|
table.cell(colspan: 3/*4*/, inset: 3pt)[],
|
|
table.hline(),
|
|
)},
|
|
placement: none, // floating figure makes this table go in the previous section :grim:
|
|
caption: [Average size and date of goodware/malware parts of the Rasta dataset],
|
|
) <tab:rasta-sizes>
|
|
|
|
We sampled our dataset to have a variety of #APK sizes, but the size of the application is not entirely proportional to the bytecode size.
|
|
Looking at @tab:rasta-sizes, we can see that although malware are in average bigger #APKs, they contains less bytecode than goodware.
|
|
In the previous section, we saw that the size of the bytecode has the most significant impact on the finishing rate of analysis tools, and indeed, @fig:rasta-exit-goodmal reflect that.
|
|
|
|
|
|
/*
|
|
```
|
|
sqlite> SELECT vt_detection == 0, COUNT(exec.sha256) FROM exec INNER JOIN apk ON exec.sha256 = apk.sha256 WHERE tool_status = 'FINISHED' AND dex_size_decile = 6 GROUP BY vt_detection == 0;
|
|
0|2971 % malware
|
|
1|60455 % goodware
|
|
sqlite> SELECT vt_detection == 0, COUNT(DISTINCT sha256) FROM apk WHERE dex_size_decile = 6 GROUP BY vt_detection == 0;
|
|
0|243
|
|
1|6009
|
|
```
|
|
```
|
|
>>> 61.13168724279835
|
|
0.4969812257050699
|
|
>>> 60455/6009/20 * 100
|
|
50.30371110001665
|
|
```
|
|
|
|
rate goodware rate malware avg size goodware (MB) avg size malware (MB)
|
|
decile 1: 85.42 82.02 0.13 0.11
|
|
decile 2: 74.46 72.34 0.54 0.55
|
|
decile 3: 63.38 65.67 1.37 1.25
|
|
decile 4: 57.21 62.31 2.41 2.34
|
|
decile 5: 53.36 59.27 3.56 3.55
|
|
decile 6: 50.3 61.13 4.61 4.56
|
|
decile 7: 46.76 56.54 5.87 5.91
|
|
decile 8: 42.57 56.23 7.64 7.63
|
|
decile 9: 39.09 57.94 11.39 11.26
|
|
decile 10: 33.34 45.86 24.24 21.36
|
|
total: 54.28 64.82 6.29 4.14
|
|
*/
|
|
|
|
#figure(
|
|
image(
|
|
"figs/exit-status-for-the-rasta-dataset-goodware-malware.svg",
|
|
width: 100%,
|
|
alt: "Bar chart showing the % of analyse apk on the y-axis and the tools on the x-axis.
|
|
Each tools has two bars, one for goodware an one for malware.
|
|
The goodware bars are the same as the one in the figure Exit status for the Rasta dataset.
|
|
The timeout rate looks the same on both bar of each tools.
|
|
The finishing rate of the malware bar is a lot higher than in the goodware bar for androguard_dad, blueseal, didfail, iccta, perfchecker and wogsen_et_al.
|
|
The finishing rate of the malware bar is higher than in the goodware bar for ic3 and ic3_fork.
|
|
The only two tools where the finishing rate is better for goodware are apparecium (by arround 15%) and redexer (by arround 10%).
|
|
The other tools have similar finishing rate, finishing rate slightly in favor of malware.
|
|
"
|
|
),
|
|
caption: [Exit status comparing goodware (left bars) and malware (right bars) for the Rasta dataset],
|
|
) <fig:rasta-exit-goodmal>
|
|
|
|
/*
|
|
[15:25] Jean-Marie Mineau
|
|
|
|
moyenne de la taille total des dex: 6464228.10027989
|
|
|
|
[15:26] Jean-Marie Mineau
|
|
|
|
(tout confondu)
|
|
|
|
[15:26] Jean-Marie Mineau
|
|
|
|
goodware: 6598464.94224066
|
|
|
|
malware: 4337376.97252155
|
|
|
|
```
|
|
sqlite> SELECT AVG(apk_size) FROM apk;
|
|
16918107.6526989
|
|
sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection = 0;
|
|
16897989.4472311
|
|
sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
|
|
17236860.8903556
|
|
```
|
|
*/
|
|
|
|
In @fig:rasta-exit-goodmal, we compared the finishing rate of malware and goodware applications for the evaluated tools.
|
|
We can see that malware and goodware seam to generate a similar number of timeouts.
|
|
However, with the exception of two tools -- apparecium and redexer, we can see a trend of goodware beeing harder to analyse than malware.
|
|
Some tools, like DAD or perfchecker, show the finishing rate ratio augment by more than 20 points.
|
|
|
|
#figure({
|
|
show table: set text(size: 0.80em)
|
|
table(
|
|
columns: 7,
|
|
inset: (x: 0% + 5pt, y: 0% + 2pt),
|
|
stroke: none,
|
|
align: center+horizon,
|
|
table.hline(),
|
|
table.header(
|
|
table.cell(colspan: 7, inset: 3pt)[],
|
|
table.cell(rowspan: 2)[*Decile*],
|
|
table.vline(end: 3),
|
|
table.vline(start: 4),
|
|
table.cell(colspan:2)[*Average DEX size (MB)*],
|
|
table.vline(end: 3),
|
|
table.vline(start: 4),
|
|
table.cell(colspan:2)[* Finishing Rate: #FR*],
|
|
table.vline(end: 3),
|
|
table.vline(start: 4),
|
|
[*Ratio Size*],
|
|
table.vline(end: 3),
|
|
table.vline(start: 4),
|
|
[*Ratio #FR*],
|
|
[Good], [Mal],
|
|
[Good], [Mal],
|
|
[Good/Mal], [Good/Mal],
|
|
),
|
|
table.cell(colspan: 7, inset: 3pt)[],
|
|
table.hline(),
|
|
table.cell(colspan: 7, inset: 3pt)[],
|
|
|
|
num(1), num(0.13), num(0.11), num(0.85), num(0.82), num(1.17), num(1.04),
|
|
num(2), num(0.54), num(0.55), num(0.74), num(0.72), num(0.97), num(1.03),
|
|
num(3), num(1.37), num(1.25), num(0.63), num(0.66), num(1.09), num(0.97),
|
|
num(4), num(2.41), num(2.34), num(0.57), num(0.62), num(1.03), num(0.92),
|
|
num(5), num(3.56), num(3.55), num(0.53), num(0.59), num(1.00), num(0.90),
|
|
num(6), num(4.61), num(4.56), num(0.50), num(0.61), num(1.01), num(0.82),
|
|
num(7), num(5.87), num(5.91), num(0.47), num(0.57), num(0.99), num(0.83),
|
|
num(8), num(7.64), num(7.63), num(0.43), num(0.56), num(1.00), num(0.76),
|
|
num(9), num(11.39), num(11.26), num(0.39), num(0.58), num(1.01), num(0.67),
|
|
num(10), num(24.24), num(21.36), num(0.33), num(0.46), num(1.13), num(0.73),
|
|
|
|
table.cell(colspan: 7, inset: 3pt)[],
|
|
table.hline(),
|
|
)},
|
|
caption: [#DEX size and Finishing Rate (#FR) per decile],
|
|
) <tab:rasta-sizes-decile>
|
|
|
|
We saw the the bytecode size may be an explanation for this increase.
|
|
To investigate this further, @tab:rasta-sizes-decile reports the bytecode size and the finishing rate of goodware and malware in each decile of bytecode size.
|
|
We also computed the ratio of the bytecode size and finishing rate for the two populations.
|
|
We observe that the while the bytecode size ratio between goodware an malware stays close to one in each deciles (excluding the two extremes), the goodware/malware finishing rate ratio decrease for each decile.
|
|
It goes from 1.03 for the 2#super[nd] decile to 0.67 in the 9#super[th] decile.
|
|
We conclude from this table that, at equal size, analyzing malware still triggers less errors than for goodware, and that the difference of errors generated between when analyzing a goodware and analyzing a malware increase with the bytecode size.
|
|
|
|
|
|
#highlight()[
|
|
*RQ3 answer:*
|
|
Analyzing malware applications triggers less errors for static analysis tools than analyzing goodware for comparable bytecode size.
|
|
]
|