thesis/3_rasta/5_soa_comp.typ

#import "../lib.typ": todo
#import "../lib.typ": etal
#import "X_var.typ": *
#import "X_lib.typ": *

== State-of-the-Art Comparison <sec:rasta-soa-comp>

In this section, we will compare our results with the contributions presented in @sec:bg-eval-tools.

Luo #etal released TaintBench~@luoTaintBenchAutomaticRealworld2022 a real-world benchmark and the  associated recommendations to build such a benchmark.
These benchmarks confirmed that some tools such as Amandroid and Flowdroid are less efficient on real-world applications.
We confirm the hypothesis of Luo #etal that real-world applications lead to less efficient analysis than using hand crafted test applications or old datasets~@luoTaintBenchAutomaticRealworld2022.
In addition, even if Drebin is not hand-crafted, it is quite old seams to present similar issue as hand-crafted dataset when used to evaluate a tool: we obtained really good results compared to the Rasta dataset -- which is more representative of realworld applications.

Our finding are also consistent with the numerical results of Pauck #etal that showed that #mypercent(106, 180) of DIALDroid-Bench~@bosuCollusiveDataLeak2017 real-world applications are analysed successfully with the 6 evaluated tools~@pauckAndroidTaintAnalysis2018.
Six years after the release of DIALDroid-Bench, we obtain a lower ratio of #mypercent(40.05, 100) for the same set of 6 tools but using the Rasta dataset of #NBTOTALSTRING applications.
We extended this result to a set of #nbtoolsvariationsrun tools and obtained a global success rate of #resultratio.
We confirmed that most tools require a significant amount of work to get them running~@reaves_droid_2016.
Our investigations of crashes also confirmed that dependencies to older versions of Apktool are impacting the performances of Anadroid, Saaf and Wognsen #etal in addition to DroidSafe and IccTa, already identified by Pauck #etal.

/*
Pauck: 235 micro bench, 30 real*
Confirm didfail failled for min_sdk >= 19, all successful run (only 4%) indicated "Only phantom classes loaded, skipping analysis..."

SELECT tool_status,  COUNT(*), AVG(dex_size)  FROM exec INNER JOIN apk on exec.sha256 = apk.sha256 WHERE min_sdk >= 19 AND tool_name = 'didfail' GROUP BY tool_status;
FAILED|16651|13139071.2363221
FINISHED|694|6617861.33717579
TIMEOUT|98|6048999.2244898
SELECT msg, COUNT(*) FROM (SELECT DISTINCT exec.sha256, msg  FROM exec INNER JOIN apk on exec.sha256 = apk.sha256 INNER JOIN error ON exec.sha256 = error.sha256 AND exec.tool_name = error.tool_name  WHERE min_sdk >= 19 AND exec.tool_name = 'didfail' AND exec.tool_status = 'FINISHED') GROUP BY msg;
|77
Only phantom classes loaded, skipping analysis...|694

DroidSafe and IccTa Failled for SDK > 19 because of old apktool

We obsered: (nb success < 2000 for min_skd >= 20)
   ['anadroid', 'blueseal', 'dialdroid', 'didfail', 'droidsafe', 'ic3_fork', 'iccta', 'perfchecker', 'saaf', 'wognsen_et_al']
anadroid|0
blueseal|521
dialdroid|812
didfail|343
droidsafe|35
ic3_fork|1393
iccta|612
perfchecker|1921
saaf|1588
wognsen_et_al|386
*/

Third, we extended to #nbtoolsselected different tools the work done by Reaves #etal on the usability of analysis tools (4 tools are in common, we added 16 new tools and two variations).
We confirmed that most tools require a significant amount of work to get them running.
We encounter similar issues with libraries and operating system incompatibilities, and noticed that, as time passes, dependencies issues may impact the build process.
For instance we encountered cases where the repository hosting the dependencies were closed, or cases where maven failed to download dependencies because the OS version did not support SSL, now mandatory to access maven central.
//, and even one case were the could not find anywhere the compiled version of sbt used to build a tool.