remove doplon in figure/tab/section reference

This commit is contained in:
Jean-Marie Mineau 2025-06-24 12:32:02 +02:00
parent d730d1f4a7
commit 6d9096e314
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
4 changed files with 26 additions and 25 deletions

View file

@ -5,7 +5,7 @@
Android is the most used mobile operating system since 2014, and since 2017, it even surpasses Windows all platforms combined#footnote[https://gs.statcounter.com/os-market-share#monthly-200901-202304]. Android is the most used mobile operating system since 2014, and since 2017, it even surpasses Windows all platforms combined#footnote[https://gs.statcounter.com/os-market-share#monthly-200901-202304].
The public adoption of Android is confirmed by application developers, with 1.3 millions apps available in the Google Play Store in 2014, and 3.5 millions apps available in 2017#footnote[https://www.statista.com/statistics/266210]. The public adoption of Android is confirmed by application developers, with 1.3 millions apps available in the Google Play Store in 2014, and 3.5 millions apps available in 2017#footnote[https://www.statista.com/statistics/266210].
Its popularity makes Android a prime target for malware developers. // For example, various applications have been shown to steal personal information~\cite{shanSelfhidingBehaviorAndroid2018}. Its popularity makes Android a prime target for malware developers. // For example, various applications have been shown to steal personal information@shanSelfhidingBehaviorAndroid2018.
Consequently, Android has also been an important subject for security research. Consequently, Android has also been an important subject for security research.
In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications. Two main approaches can be distinguished: static and dynamic analysis@Li2017. In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications. Two main approaches can be distinguished: static and dynamic analysis@Li2017.
Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system. Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system.
@ -46,10 +46,10 @@ As a summary, the contributions of this paper are the following:
*/ */
The paper is structured as follows. The paper is structured as follows.
Section@sec:rasta-soa presents a summary of previous works dedicated to Android static analysis tools. @sec:rasta-soa presents a summary of previous works dedicated to Android static analysis tools.
Section@sec:rasta-methodology presents the methodology employed to build our evaluation process and Section@sec:rasta-xp gives the associated experimental results. @sec:rasta-methodology presents the methodology employed to build our evaluation process and @sec:rasta-xp gives the associated experimental results.
// Section@sec:rasta-discussion investigates the reasons behind the observed failures of some of the tools. // @sec:rasta-discussion investigates the reasons behind the observed failures of some of the tools.
Section@sec:rasta-discussion discusses the limitations of this work and gives some takeaways for future contributions. @sec:rasta-discussion discusses the limitations of this work and gives some takeaways for future contributions.
Section@sec:rasta-conclusion concludes the paper. @sec:rasta-conclusion concludes the paper.

View file

@ -218,12 +218,12 @@ A campaign of tests consists in executing the #nbtoolsvariationsrun selected too
The constraints applied on the clusters are: The constraints applied on the clusters are:
- No network connection is authorized in order to limit any execution of malicious software. - No network connection is authorized in order to limit any execution of malicious software.
- The allocated RAM for a task is \ramlimit. - The allocated RAM for a task is #ramlimit.
- The allocated maximum time is 1 hour. - The allocated maximum time is 1 hour.
- The allocated object space / stack space is 64 GB / 16 GB if the tool is a Java based program. - The allocated object space / stack space is 64 GB / 16 GB if the tool is a Java based program.
For the disk files, we use a mount point that is stored on a SSD disk, with no particular limit of size. For the disk files, we use a mount point that is stored on a SSD disk, with no particular limit of size.
Note that, because the allocation of #ramlimit could be insufficient for some tool, we evaluated the results of the tools on 20% of our dataset (described later in Section@sec:rasta-dataset) with 128 GB of RAM and #ramlimit of RAM and checked that the results were similar. Note that, because the allocation of #ramlimit could be insufficient for some tool, we evaluated the results of the tools on 20% of our dataset (described later in @sec:rasta-dataset) with 128 GB of RAM and #ramlimit of RAM and checked that the results were similar.
With this confirmation, we continued our evaluations with #ramlimit of RAM only. With this confirmation, we continued our evaluations with #ramlimit of RAM only.
@ -233,25 +233,25 @@ With this confirmation, we continued our evaluations with #ramlimit of RAM only.
DATASET DATASET
first seen year: pas dans les BDD officielles d'Androzoo: min added dans AndroZoo et date de VT analysis first seen year: pas dans les BDD officielles d'Androzoo: min added dans AndroZoo et date de VT analysis
%
année: 2010 et 2023 année: 2010 et 2023
7% de malware 7% de malware
%
0 detection dans VT: good 0 detection dans VT: good
5+ => malware 5+ => malware
0-5 detection: exclu 0-5 detection: exclu
%
Les tranches de taille sont des déciles de d'androzoo (- les 1% extreme) Les tranches de taille sont des déciles de d'androzoo (- les 1% extreme)
pour chaque année, pour chaque tranche de taille, on selectionne randomly 500 applications (avec bonne proporotion de malware) = bucket. pour chaque année, pour chaque tranche de taille, on selectionne randomly 500 applications (avec bonne proporotion de malware) = bucket.
%
Probleme: Ce n'est pas représentatif de la population: il n'y a propablement pas 7% de malware and chaque décile d'androzoo pour chaque année Probleme: Ce n'est pas représentatif de la population: il n'y a propablement pas 7% de malware and chaque décile d'androzoo pour chaque année
Probleme 2: pour sampler, on utilise les deciles de taille d'apk, mais pour nos plot on utiliser les deciles de taille de dex file. Probleme 2: pour sampler, on utilise les deciles de taille d'apk, mais pour nos plot on utiliser les deciles de taille de dex file.
%
500*10*14=70000 500*10*14=70000
%
%
*/ */
// Two datasets are used in the experiments of this section. // Two datasets are used in the experiments of this section.
@ -271,4 +271,4 @@ Applications in between are dropped.
For computing the release date of an application, we contacted the authors of Androzoo to compute the minimum date between the submission to Androzoo and the first upload to VirusTotal. For computing the release date of an application, we contacted the authors of Androzoo to compute the minimum date between the submission to Androzoo and the first upload to VirusTotal.
Such a computation is more reliable than using the DEX date that is often obfuscated when packaging the application. Such a computation is more reliable than using the DEX date that is often obfuscated when packaging the application.
// \todo[Transition] // plus de place :-( // #todo[Transition] // plus de place :-(

View file

@ -20,11 +20,11 @@
) <fig:rasta-exit> ) <fig:rasta-exit>
Figures@fig:rasta-exit-drebin and@fig:rasta-exit compare the Drebin and Rasta datasets. @fig:rasta-exit-drebin and @fig:rasta-exit compare the Drebin and Rasta datasets.
They represent the success/failure rate (green/orange) of the tools. They represent the success/failure rate (green/orange) of the tools.
We distinguished failure to compute a result from timeout (blue) and crashes of our evaluation framework (in grey, probably due to out of memory kills of the container itself). We distinguished failure to compute a result from timeout (blue) and crashes of our evaluation framework (in grey, probably due to out of memory kills of the container itself).
Because it may be caused by a bug in our own analysis stack, exit status represented in grey (Other) are considered as unknown errors and not as failure of the tool. Because it may be caused by a bug in our own analysis stack, exit status represented in grey (Other) are considered as unknown errors and not as failure of the tool.
#todo[We discuss further errors for which we have information in the logs in Section/*@sec:rasta-failure-analysis*/.] #todo[We discuss further errors for which we have information in the logs in /*@*/sec:rasta-failure-analysis.]
Results on the Drebin datasets shows that 11 tools have a high success rate (greater than 85%). Results on the Drebin datasets shows that 11 tools have a high success rate (greater than 85%).
The other tools have poor results. The other tools have poor results.
@ -37,7 +37,7 @@ Three tools (androguard_dad, blueseal, saaf) that were performing well (higher t
Regarding IC3, the fork with a simpler build process and support for modern OS has a lower success rate than the original tool. Regarding IC3, the fork with a simpler build process and support for modern OS has a lower success rate than the original tool.
Two tools should be discussed in particular. Two tools should be discussed in particular.
//Androguard and Flowdroid have a large community of users, as shown by the numbers of GitHub stars in Table~\ref{tab:sources}. //Androguard and Flowdroid have a large community of users, as shown by the numbers of GitHub stars in @tab:rasta-sources.
Androguard has a high success rate which is not surprising: it used by a lot of tools, including for analyzing application uploaded to the Androzoo repository. Androguard has a high success rate which is not surprising: it used by a lot of tools, including for analyzing application uploaded to the Androzoo repository.
//Because of that, it should be noted that our dataset is biased in favour of Androguard. // Already in discution //Because of that, it should be noted that our dataset is biased in favour of Androguard. // Already in discution
Nevertheless, when using Androguard decompiler (DAD) to decompile an APK, it fails more than 50% of the time. Nevertheless, when using Androguard decompiler (DAD) to decompile an APK, it fails more than 50% of the time.
@ -50,9 +50,9 @@ is #mypercent(54.9, 100). When including the two defective tools, this ratio dr
#highlight()[ #highlight()[
*RQ1 answer:* *RQ1 answer:*
On a recent dataset we consider that \resultunusable of the tools are unusable. On a recent dataset we consider that #resultunusable of the tools are unusable.
For the tools that we could run, \resultratio of analysis are finishing successfully. For the tools that we could run, #resultratio of analysis are finishing successfully.
//(those with less than 50\% of successful execution and including the two tools that we were unable to build). //(those with less than 50% of successful execution and including the two tools that we were unable to build).
] ]
/* /*
@ -85,7 +85,8 @@ For the tools that we could run, \resultratio of analysis are finishing successf
For investigating the effect of application dates on the tools, we computed the date of each APK based on the minimum date between the first upload in AndroZoo and the first analysis in VirusTotal. For investigating the effect of application dates on the tools, we computed the date of each APK based on the minimum date between the first upload in AndroZoo and the first analysis in VirusTotal.
Such a computation is more reliable than using the dex date that is often obfuscated when packaging the application. Such a computation is more reliable than using the dex date that is often obfuscated when packaging the application.
Then, for the sake of clarity of our results, we separated the tools that have mainly Java source code from those that use other languages. Then, for the sake of clarity of our results, we separated the tools that have mainly Java source code from those that use other languages.
Among the ones that are Java based programs, most of them use the Soot framework which may correlate the obtained results. @fig:rasta-exit-evolution-java (resp. @fig:rasta-exit-evolution-not-java) compares the success rate of the tools between 2010 and 2023 for Java based tools (resp. non Java based tools). Among the ones that are Java based programs, most of them use the Soot framework which may correlate the obtained results.
@fig:rasta-exit-evolution-java (resp. @fig:rasta-exit-evolution-not-java) compares the success rate of the tools between 2010 and 2023 for Java based tools (resp. non Java based tools).
For Java based tools, a clear decrease of finishing rate can be observed globally for all tools. For Java based tools, a clear decrease of finishing rate can be observed globally for all tools.
For non-Java based tools, 2 of them keep a high success rate (Androguard, Mallodroid). For non-Java based tools, 2 of them keep a high success rate (Androguard, Mallodroid).
The result is expected for Androguard, because the analysis is relatively simple and the tool is largely adopted, as previously mentioned. The result is expected for Androguard, because the analysis is relatively simple and the tool is largely adopted, as previously mentioned.

View file

@ -374,8 +374,8 @@ Our attempts to upgrade those dependencies led to new errors appearing: we concl
Luo #etal released TaintBench@luoTaintBenchAutomaticRealworld2022 a real-world benchmark and the associated recommendations to build such a benchmark. Luo #etal released TaintBench@luoTaintBenchAutomaticRealworld2022 a real-world benchmark and the associated recommendations to build such a benchmark.
These benchmarks confirmed that some tools such as Amandroid and Flowdroid are less efficient on real-world applications. These benchmarks confirmed that some tools such as Amandroid and Flowdroid are less efficient on real-world applications.
// Pauck {\it et al.}~\cite{pauckAndroidTaintAnalysis2018} // Pauck #etal@pauckAndroidTaintAnalysis2018
// Reaves {\it et al.}~\cite{reaves_droid_2016} // Reaves #etal@reaves_droid_2016
We finally compare our results to the conclusions and discussions of previous papers@luoTaintBenchAutomaticRealworld2022 @pauckAndroidTaintAnalysis2018 @reaves_droid_2016. We finally compare our results to the conclusions and discussions of previous papers@luoTaintBenchAutomaticRealworld2022 @pauckAndroidTaintAnalysis2018 @reaves_droid_2016.
First we confirm the hypothesis of Luo #etal that real-world applications lead to less efficient analysis than using hand crafted test applications or old datasets@luoTaintBenchAutomaticRealworld2022. First we confirm the hypothesis of Luo #etal that real-world applications lead to less efficient analysis than using hand crafted test applications or old datasets@luoTaintBenchAutomaticRealworld2022.