start reworking discution
Some checks failed
/ test_checkout (push) Failing after 1s

This commit is contained in:
Jean-Marie Mineau 2025-08-14 00:50:26 +02:00
parent 02be146060
commit 973a302f1d
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
2 changed files with 19 additions and 31 deletions

View file

@ -1,9 +1,12 @@
#import "../lib.typ": todo, etal, paragraph
#import "../lib.typ": todo, jfl-note
#import "../lib.typ": etal, paragraph
#import "X_var.typ": *
#import "X_lib.typ": *
== Discussion <sec:rasta-discussion>
#todo[split into: error analysis, soa comp, recommendations and limitations]
#figure({
show table: set text(size: 0.50em)
show table.cell.where(y: 0): it => if it.x == 0 { it } else { rotate(-90deg, reflow: true, it) }
@ -151,8 +154,9 @@ Regarding errors linked to the disk space, we observe few ratios for the excepti
Manual inspections revealed that those errors are often a consequence of a failed apktool execution.
Second, the black squares indicate frequent errors that need to be investigated separately.
In the rest of this section, we manually analyzed, when possible, the code that generates this high ratio of errors and we give feedback about the possible causes and difficulties to write a bug fix.
In the next subsection, we manually analyzed, when possible, the code that generates this high ratio of errors and we give feedback about the possible causes and difficulties to write a bug fix.
=== Tool by Tool Failure Analysis <sec:rasta-tool-by-tool-failure-analysis>
/*
Dialdroid: TODO
com.google.common.util.concurrent.ExecutionError -> memory error: java.lang.StackOverflowError, java.lang.OutOfMemoryError: Java heap space, java.lang.OutOfMemoryError: GC overhead limit exceeded
@ -211,7 +215,7 @@ Anadroid: DONE
Surprisingly, while Androguard almost never fails to analyze an APK, the internal decompiler of Androguard (DAD) fails more than half of the time.
The analysis of the logs shows that the issue comes from the way the decompiled methods are stored: each method is stored in a file named after the method name and signature, and this file name can quickly exceed the size limit (255 characters on most file systems).
It should be noticed that Androguard_dad rarely fails on the Drebin dataset.
This illustrate the importance to test tools on real and up-to-date APKs: even a bad handling of filenames can influence an analysis.
This illustrates the importance to test tools on real and up-to-date APKs: even a bad handling of filenames can influence an analysis.
]
/*
@ -303,7 +307,7 @@ jasError
#paragraph([Flowdroid])[
Our exchanges with the authors of Flowdroid led us to expect more timeouts from too long executions than failed run.
#todo[Deja dit? : Surprisingly we only got #mypercent(37,NBTOTAL) of timeout, and a hight number of failures.]
Surprisingly we only got #mypercent(37,NBTOTAL) of timeout, and a hight number of failures.
We tried to detect recurring causes of failures, but the complexity of Flowdroid make the investigation difficult.
Most exceptions seems to be related to concurrency. //or display a generic messages.
Other errors that came up regularly are `java.nio.channels.ClosedChannelException` which is raised when Flowdoid fails to read from the APK, although we did not find the reason of the failure, null pointer exceptions when trying to check if a null value is in a `ConcurrentHashMap` (in `LazySummaryProvider.getClassFlows()`) and `StackOverflowError` from `StronglyConnectedComponentsFast.recurse()`.
@ -329,32 +333,14 @@ Pauck: Flowdroid avg 2m on DIALDroid-Bench (real worlds apks)
As a conclusion, we observe that a lot of errors can be linked to bugs in dependencies.
Our attempts to upgrade those dependencies led to new errors appearing: we conclude that this is a no trivial task that require familiarity with the inner code of the tools.
=== State of the art comparison
=== State-of-the-art comparison
Luo #etal released TaintBench~@luoTaintBenchAutomaticRealworld2022 a real-world benchmark and the associated recommendations to build such a benchmark.
These benchmarks confirmed that some tools such as Amandroid and Flowdroid are less efficient on real-world applications.
// Pauck #etal@pauckAndroidTaintAnalysis2018
// Reaves #etal@reaves_droid_2016
We confirm the hypothesis of Luo #etal that real-world applications lead to less efficient analysis than using hand crafted test applications or old datasets~@luoTaintBenchAutomaticRealworld2022.
In addition, even if Drebin is not hand-crafted, it is quite old seams to present similar issue as hand-crafted dataset when used to evaluate a tool: we obtained really good results compared to the Rasta dataset -- which is more representative of realworld applications.
We finally compare our results to the conclusions and discussions of previous papers~@luoTaintBenchAutomaticRealworld2022 @pauckAndroidTaintAnalysis2018 @reaves_droid_2016.
First we confirm the hypothesis of Luo #etal that real-world applications lead to less efficient analysis than using hand crafted test applications or old datasets~@luoTaintBenchAutomaticRealworld2022.
Even if Drebin is not hand-crafted, it is quite old and we obtained really good results compared to the Rasta dataset.
When considering real-world applications, the size is rather different from hand crafted application, which impacts the success rate.
We believe that it is explained by the fact that the complexity of the code increases with its size.
/*
30*6
180
21+20+27+2+18+18
106
106/180*100
58.88
*/
=== State-of-the-art comparison
Our finding are consistent with the numerical results of Pauck #etal that showed that #mypercent(106, 180) of DIALDroid-Bench~@bosuCollusiveDataLeak2017 real-world applications are analyzed successfully with the 6 evaluated tools~@pauckAndroidTaintAnalysis2018.
Our finding are also consistent with the numerical results of Pauck #etal that showed that #mypercent(106, 180) of DIALDroid-Bench~@bosuCollusiveDataLeak2017 real-world applications are analyzed successfully with the 6 evaluated tools~@pauckAndroidTaintAnalysis2018.
Six years after the release of DIALDroid-Bench, we obtain a lower ratio of #mypercent(40.05, 100) for the same set of 6 tools but using the Rasta dataset of #NBTOTALSTRING applications.
We extended this result to a set of #nbtoolsvariationsrun tools and obtained a global success rate of #resultratio.
We confirmed that most tools require a significant amount of work to get them running~@reaves_droid_2016.
@ -390,14 +376,14 @@ wognsen_et_al|386
Third, we extended to #nbtoolsselected different tools the work done by Reaves #etal on the usability of analysis tools (4 tools are in common, we added 16 new tools and two variations).
We confirmed that most tools require a significant amount of work to get them running.
We encounter similar issues with libraries and operating system incompatibilities, and noticed that, with time, dependencies issues may impact the build process.
We encounter similar issues with libraries and operating system incompatibilities, and noticed that, as time passes, dependencies issues may impact the build process.
For instance we encountered cases where the repository hosting the dependencies were closed, or cases where maven failed to download dependencies because the OS version did not support SSL, now mandatory to access maven central.
//, and even one case were the could not find anywhere the compiled version of sbt used to build a tool.
=== Recommendations
Finally, we summarize some takeaways that developers should follow to improve the success of reusing their developed software.
#jfl-note[Finally, we summarize some takeaways that developers should follow to improve the success of reusing their developed software.][*developer*: dire que a la lumiere de ces resultats, on peut pense que certain pbs peuvent être évité ou bien corrigé par l'utilisateur]
For improving the reliability of their software, developers should use classical development best practices, for example continuous integration, testing, code review.
For improving the reusability developers should write a documentation about the tool usage and provide a minimal working example and describe the expected results.

View file

@ -1,17 +1,19 @@
#import "@local/template-thesis-matisse:0.0.1": etal
#import "../lib.typ": todo
#import "../lib.typ": todo, jfl-note
#import "X_var.typ": *
#todo[Futur work: new systematic literature review, maybe check https://ieeexplore.ieee.org/abstract/document/9118907 ?]
== Conclusion <sec:rasta-conclusion>
#todo[Anwser pb1]
This paper has assessed the suggested results of the literature~@luoTaintBenchAutomaticRealworld2022 @pauckAndroidTaintAnalysis2018 @reaves_droid_2016 about the reliability of static analysis tools for Android applications.
With a dataset of #NBTOTALSTRING applications we established that #resultunusable of #nbtoolsselectedvariations tools are not reusable, when considering that a tool that has more than 50% of time a failure is unusable.
In total, the analysis success rate of the tools that we could run for the entire dataset is #resultratio.
The characteristics that have the most influence on the success rate is the bytecode size and min SDK version. Finally, we showed that malware APKs have a better finishing rate than goodware.
In future works, we plan to investigate deeper the reported errors of the tools in order to analyze the most common types of errors, in particular for Java based tools.
#jfl-note[In future works, we plan to investigate deeper the reported errors of the tools in order to analyze the most common types of errors, in particular for Java based tools.
We also plan to extend this work with a selection of more recent tools performing static analysis.
Following Reaves #etal recommendations~@reaves_droid_2016, we publish the Docker and Singularity images we built to run our experiments alongside the Docker files. This will allow the research community to use directly the tools without the build and installation penalty.
Following Reaves #etal recommendations~@reaves_droid_2016, we publish the Docker and Singularity images we built to run our experiments alongside the Docker files. This will allow the research community to use directly the tools without the build and installation penalty.][*Developper*]