rasta: wip
All checks were successful
/ test_checkout (push) Successful in 1m13s

This commit is contained in:
Jean-Marie Mineau 2025-08-13 00:44:25 +02:00
parent 5e512b585a
commit 01ce20ffda
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
7 changed files with 81 additions and 59 deletions

View file

@ -1,4 +1,4 @@
#import "../lib.typ": todo, highlight, num, paragraph, SDK
#import "../lib.typ": todo, highlight, num, paragraph, SDK, APK, DEX, FR
#import "X_var.typ": *
#import "X_lib.typ": *
@ -10,12 +10,12 @@
#todo[alt text for figure rasta-exit / rasta-exit-drebin]
#figure(
image("figs/exit-status-for-the-drebin-dataset.svg", width: 80%),
image("figs/exit-status-for-the-drebin-dataset.svg", width: 100%),
caption: [Exit status for the Drebin dataset],
) <fig:rasta-exit-drebin>
#figure(
image("figs/exit-status-for-the-rasta-dataset.svg", width: 80%),
image("figs/exit-status-for-the-rasta-dataset.svg", width: 100%),
caption: [Exit status for the Rasta dataset],
) <fig:rasta-exit>
@ -24,7 +24,7 @@
They represent the success/failure rate (green/orange) of the tools.
We distinguished failure to compute a result from timeout (blue) and crashes of our evaluation framework (in grey, probably due to out of memory kills of the container itself).
Because it may be caused by a bug in our own analysis stack, exit status represented in grey (Other) are considered as unknown errors and not as failure of the tool.
#todo[We discuss further errors for which we have information in the logs in @sec:rasta-failure-analysis.]
We discuss further errors for which we have information in the logs in @sec:rasta-failure-analysis.
Results on the Drebin datasets shows that 11 tools have a high success rate (greater than 85%).
The other tools have poor results.
@ -46,7 +46,8 @@ Concerning Flowdroid, our results show a very low timeout rate (#mypercent(37, N
As a summary, the final ratio of successful analysis for the tools that we could run
// and applications of Rasta dataset
is #mypercent(54.9, 100). When including the two defective tools, this ratio drops to #mypercent(49.9, 100).
is #mypercent(54.9, 100).
When including the two defective tools, this ratio drops to #mypercent(49.9, 100).
#highlight()[
*RQ1 answer:*
@ -63,7 +64,7 @@ For the tools that we could run, #resultratio of analysis are finishing successf
[#figure(
image(
"figs/finishing-rate-by-year-of-java-based-tools.svg",
width: 48%,
width: 50%,
alt: ""
),
caption: [Java based tools],
@ -72,7 +73,7 @@ For the tools that we could run, #resultratio of analysis are finishing successf
[#figure(
image(
"figs/finishing-rate-by-year-of-non-java-based-tools.svg",
width: 48%,
width: 50%,
alt: "",
),
caption: [Non Java based tools],
@ -81,7 +82,7 @@ For the tools that we could run, #resultratio of analysis are finishing successf
), caption: [Exit status evolution for the Rasta dataset]
)
For investigating the effect of application dates on the tools, we computed the date of each APK based on the minimum date between the first upload in AndroZoo and the first analysis in VirusTotal.
For investigating the effect of application dates on the tools, we computed the date of each #APK based on the minimum date between the first upload in AndroZoo and the first analysis in VirusTotal.
Such a computation is more reliable than using the dex date that is often obfuscated when packaging the application.
Then, for the sake of clarity of our results, we separated the tools that have mainly Java source code from those that use other languages.
Among the ones that are Java based programs, most of them use the Soot framework which may correlate the obtained results.
@ -126,7 +127,7 @@ To compare the influence of the date, #SDK version and size of applications, we
[#figure(
image(
"figs/decorelation/finishing-rate-of-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg",
width: 48%,
width: 50%,
alt: ""
),
caption: [Java based tools],
@ -135,7 +136,7 @@ To compare the influence of the date, #SDK version and size of applications, we
[#figure(
image(
"figs/decorelation/finishing-rate-of-non-java-based-tool-by-bytecode-size-of-apks-detected-in-2022.svg",
width: 48%,
width: 50%,
alt: "",
),
caption: [Non Java based tools],
@ -144,10 +145,11 @@ To compare the influence of the date, #SDK version and size of applications, we
), caption: [Finishing rate by bytecode size for APK detected in 2022]
) <fig:rasta-decorelation-size>
#paragraph([Fixed application year. (#num(5000) APKs)])[
#paragraph[Fixed application year. (#num(5000) APKs)][
We selected the year 2022 which has a good amount of representatives for each decile of size in our application dataset.
@fig:rasta-rate-evolution-java-2022} (resp. @fig:rasta-rate-evolution-non-java-2022) shows the finishing rate of the tools in function of the size of the bytecode for Java based tools (resp. non Java based tools) analyzing applications of 2022.
We can observe that all Java based tools have a finishing rate decreasing over years. 50% of non Java based tools have the same behavior.
@fig:rasta-rate-evolution-java-2022 (resp. @fig:rasta-rate-evolution-non-java-2022) shows the finishing rate of the tools in function of the size of the bytecode for Java based tools (resp. non Java based tools) analyzing applications of 2022.
We can observe that all Java based tools have a finishing rate decreasing over years.
50% of non Java based tools have the same behavior.
]
#todo[Alt text for fig rasta-decorelation-size]
@ -155,7 +157,7 @@ We can observe that all Java based tools have a finishing rate decreasing over y
[#figure(
image(
"figs/decorelation/finishing-rate-of-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg",
width: 48%,
width: 50%,
alt: ""
),
caption: [Java based tools],
@ -164,7 +166,7 @@ We can observe that all Java based tools have a finishing rate decreasing over y
[#figure(
image(
"figs/decorelation/finishing-rate-of-non-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg",
width: 48%,
width: 50%,
alt: "",
),
caption: [Non Java based tools],
@ -173,7 +175,7 @@ We can observe that all Java based tools have a finishing rate decreasing over y
), caption: [Finishing rate by discovery year with a bytecode size $in$ [4.08, 5.2] MB]
) <fig:rasta-decorelation-size>
#paragraph([Fixed application bytecode size. (#num(6252) APKs)])[We selected the sixth decile (between 4.08 and 5.20 MB), which is well represented in a wide number of years.
#paragraph[Fixed application bytecode size. (#num(6252) APKs)][We selected the sixth decile (between 4.08 and 5.20 MB), which is well represented in a wide number of years.
@fig:rasta-rate-evolution-java-decile-year (resp. @fig:rasta-rate-evolution-non-java-decile-year) represents the finishing rate depending of the year at a fixed bytecode size.
We observe that 9 tools over 12 have a finishing rate dropping below 20% for Java based tools, which is not the case for non Java based tools.
]
@ -183,7 +185,7 @@ We observe that 9 tools over 12 have a finishing rate dropping below 20% for Jav
[#figure(
image(
"figs/decorelation/finishing-rate-of-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg",
width: 48%,
width: 50%,
alt: ""
),
caption: [Java based tools],
@ -192,7 +194,7 @@ We observe that 9 tools over 12 have a finishing rate dropping below 20% for Jav
[#figure(
image(
"figs/decorelation/finishing-rate-of-non-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg",
width: 48%,
width: 50%,
alt: "",
),
caption: [Non Java based tools],
@ -205,7 +207,7 @@ We performed similar experiments by variating the min #SDK and target #SDK versi
We found that contrary to the target #SDK, the min #SDK version has an impact on the finishing rate of Java based tools: 8 tools over 12 are below 50% after #SDK 16.
It is not surprising, as the min #SDK is highly correlated to the year.
#highlight()[
#highlight(breakable: false)[
*RQ2 answer:*
For the #nbtoolsselected tools that can be used partially, a global decrease of the success rate of tools' analysis is observed over time.
Starting at 78% of success rate, after five years, tools have 61% of success; after ten years, 45% of success.
@ -253,7 +255,7 @@ sqlite> SELECT vt_detection == 0, COUNT(DISTINCT sha256) FROM apk WHERE dex_size
#figure(
image(
"figs/exit-status-for-the-rasta-dataset-goodware-malware.svg",
width: 80%,
width: 100%,
alt: "",
),
caption: [Exit status comparing goodware and malware for the Rasta dataset],
@ -288,32 +290,32 @@ sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
#figure({
show table: set text(size: 0.80em)
table(
columns: 4,
columns: 3, //4,
inset: (x: 0% + 5pt, y: 0% + 2pt),
stroke: none,
align: center+horizon,
table.hline(),
table.header(
table.cell(colspan: 4, inset: 3pt)[],
table.cell(colspan: 3/*4*/, inset: 3pt)[],
table.cell(rowspan:2)[*Rasta part*],
table.vline(end: 3),
table.vline(start: 4),
table.cell(colspan:2)[*Average size*],
table.vline(end: 3),
table.vline(start: 4),
table.cell(rowspan:2)[*Average date*],
table.cell(colspan:2)[*Average size* (MB)],
//table.vline(end: 3),
//table.vline(start: 4),
//table.cell(rowspan:2)[*Average date*],
[*APK*],
[*DEX*],
),
table.cell(colspan: 4, inset: 3pt)[],
table.cell(colspan: 3/*4*/, inset: 3pt)[],
table.hline(),
table.cell(colspan: 4, inset: 3pt)[],
table.cell(colspan: 3/*4*/, inset: 3pt)[],
[*goodware*], num(16897989), num(6598464), [2017],
[*malware*], num(17236860), num(4337376), [2017],
[*total*], num(16918107), num(6464228), [2017],
[*goodware*], num(calc.round(16.897989, digits: 1)), num(calc.round(6.598464, digits: 1)),// [2017],
[*malware*], num(calc.round(17.236860, digits: 1)), num(calc.round(4.337376, digits: 1)),// [2017],
[*total*], num(calc.round(16.918107, digits: 1)), num(calc.round(6.464228, digits: 1)),// [2017],
table.cell(colspan: 4, inset: 3pt)[],
table.cell(colspan: 3/*4*/, inset: 3pt)[],
table.hline(),
)},
caption: [Average size and date of goodware/malware parts of the Rasta dataset],
@ -336,13 +338,13 @@ sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
table.cell(colspan:2)[*Average DEX size (MB)*],
table.vline(end: 3),
table.vline(start: 4),
table.cell(colspan:2)[* Finishing Rate: FR*],
table.cell(colspan:2)[* Finishing Rate: #FR*],
table.vline(end: 3),
table.vline(start: 4),
[*Ratio Size*],
table.vline(end: 3),
table.vline(start: 4),
[*Ratio FR*],
[*Ratio #FR*],
[Good], [Mal],
[Good], [Mal],
[Good/Mal], [Good/Mal],
@ -365,7 +367,7 @@ sqlite> SELECT AVG(apk_size) FROM apk WHERE vt_detection != 0;
table.cell(colspan: 7, inset: 3pt)[],
table.hline(),
)},
caption: [DEX size and Finishing Rate (FR) per decile],
caption: [#DEX size and Finishing Rate (#FR) per decile],
) <tab:rasta-sizes-decile>
We compared the finishing rate of malware and goodware applications for evaluated tools.