From 6d9096e3141c465ba9b402857d23f2d1599e8a4c Mon Sep 17 00:00:00 2001 From: Jean-Marie Mineau Date: Tue, 24 Jun 2025 12:32:02 +0200 Subject: [PATCH] remove doplon in figure/tab/section reference --- 3_rasta/0_intro.typ | 12 ++++++------ 3_rasta/2_methodology.typ | 20 ++++++++++---------- 3_rasta/3_experiments.typ | 15 ++++++++------- 3_rasta/4_discussion.typ | 4 ++-- 4 files changed, 26 insertions(+), 25 deletions(-) diff --git a/3_rasta/0_intro.typ b/3_rasta/0_intro.typ index f1255d5..6e784a0 100644 --- a/3_rasta/0_intro.typ +++ b/3_rasta/0_intro.typ @@ -5,7 +5,7 @@ Android is the most used mobile operating system since 2014, and since 2017, it even surpasses Windows all platforms combined#footnote[https://gs.statcounter.com/os-market-share#monthly-200901-202304]. The public adoption of Android is confirmed by application developers, with 1.3 millions apps available in the Google Play Store in 2014, and 3.5 millions apps available in 2017#footnote[https://www.statista.com/statistics/266210]. -Its popularity makes Android a prime target for malware developers. // For example, various applications have been shown to steal personal information~\cite{shanSelfhidingBehaviorAndroid2018}. +Its popularity makes Android a prime target for malware developers. // For example, various applications have been shown to steal personal information@shanSelfhidingBehaviorAndroid2018. Consequently, Android has also been an important subject for security research. In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications. Two main approaches can be distinguished: static and dynamic analysis@Li2017. Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system. @@ -46,10 +46,10 @@ As a summary, the contributions of this paper are the following: */ The paper is structured as follows. -Section@sec:rasta-soa presents a summary of previous works dedicated to Android static analysis tools. -Section@sec:rasta-methodology presents the methodology employed to build our evaluation process and Section@sec:rasta-xp gives the associated experimental results. -// Section@sec:rasta-discussion investigates the reasons behind the observed failures of some of the tools. -Section@sec:rasta-discussion discusses the limitations of this work and gives some takeaways for future contributions. -Section@sec:rasta-conclusion concludes the paper. +@sec:rasta-soa presents a summary of previous works dedicated to Android static analysis tools. +@sec:rasta-methodology presents the methodology employed to build our evaluation process and @sec:rasta-xp gives the associated experimental results. +// @sec:rasta-discussion investigates the reasons behind the observed failures of some of the tools. +@sec:rasta-discussion discusses the limitations of this work and gives some takeaways for future contributions. +@sec:rasta-conclusion concludes the paper. diff --git a/3_rasta/2_methodology.typ b/3_rasta/2_methodology.typ index 768ba79..e7d4fd6 100644 --- a/3_rasta/2_methodology.typ +++ b/3_rasta/2_methodology.typ @@ -218,12 +218,12 @@ A campaign of tests consists in executing the #nbtoolsvariationsrun selected too The constraints applied on the clusters are: - No network connection is authorized in order to limit any execution of malicious software. -- The allocated RAM for a task is \ramlimit. +- The allocated RAM for a task is #ramlimit. - The allocated maximum time is 1 hour. - The allocated object space / stack space is 64 GB / 16 GB if the tool is a Java based program. For the disk files, we use a mount point that is stored on a SSD disk, with no particular limit of size. -Note that, because the allocation of #ramlimit could be insufficient for some tool, we evaluated the results of the tools on 20% of our dataset (described later in Section@sec:rasta-dataset) with 128 GB of RAM and #ramlimit of RAM and checked that the results were similar. +Note that, because the allocation of #ramlimit could be insufficient for some tool, we evaluated the results of the tools on 20% of our dataset (described later in @sec:rasta-dataset) with 128 GB of RAM and #ramlimit of RAM and checked that the results were similar. With this confirmation, we continued our evaluations with #ramlimit of RAM only. @@ -233,25 +233,25 @@ With this confirmation, we continued our evaluations with #ramlimit of RAM only. DATASET first seen year: pas dans les BDD officielles d'Androzoo: min added dans AndroZoo et date de VT analysis -% + année: 2010 et 2023 7% de malware -% + 0 detection dans VT: good 5+ => malware 0-5 detection: exclu -% + Les tranches de taille sont des déciles de d'androzoo (- les 1% extreme) pour chaque année, pour chaque tranche de taille, on selectionne randomly 500 applications (avec bonne proporotion de malware) = bucket. -% + Probleme: Ce n'est pas représentatif de la population: il n'y a propablement pas 7% de malware and chaque décile d'androzoo pour chaque année Probleme 2: pour sampler, on utilise les deciles de taille d'apk, mais pour nos plot on utiliser les deciles de taille de dex file. -% + 500*10*14=70000 -% -% + + */ // Two datasets are used in the experiments of this section. @@ -271,4 +271,4 @@ Applications in between are dropped. For computing the release date of an application, we contacted the authors of Androzoo to compute the minimum date between the submission to Androzoo and the first upload to VirusTotal. Such a computation is more reliable than using the DEX date that is often obfuscated when packaging the application. -// \todo[Transition] // plus de place :-( +// #todo[Transition] // plus de place :-( diff --git a/3_rasta/3_experiments.typ b/3_rasta/3_experiments.typ index 9203299..fa1cac7 100644 --- a/3_rasta/3_experiments.typ +++ b/3_rasta/3_experiments.typ @@ -20,11 +20,11 @@ ) -Figures@fig:rasta-exit-drebin and@fig:rasta-exit compare the Drebin and Rasta datasets. +@fig:rasta-exit-drebin and @fig:rasta-exit compare the Drebin and Rasta datasets. They represent the success/failure rate (green/orange) of the tools. We distinguished failure to compute a result from timeout (blue) and crashes of our evaluation framework (in grey, probably due to out of memory kills of the container itself). Because it may be caused by a bug in our own analysis stack, exit status represented in grey (Other) are considered as unknown errors and not as failure of the tool. -#todo[We discuss further errors for which we have information in the logs in Section/*@sec:rasta-failure-analysis*/.] +#todo[We discuss further errors for which we have information in the logs in /*@*/sec:rasta-failure-analysis.] Results on the Drebin datasets shows that 11 tools have a high success rate (greater than 85%). The other tools have poor results. @@ -37,7 +37,7 @@ Three tools (androguard_dad, blueseal, saaf) that were performing well (higher t Regarding IC3, the fork with a simpler build process and support for modern OS has a lower success rate than the original tool. Two tools should be discussed in particular. -//Androguard and Flowdroid have a large community of users, as shown by the numbers of GitHub stars in Table~\ref{tab:sources}. +//Androguard and Flowdroid have a large community of users, as shown by the numbers of GitHub stars in @tab:rasta-sources. Androguard has a high success rate which is not surprising: it used by a lot of tools, including for analyzing application uploaded to the Androzoo repository. //Because of that, it should be noted that our dataset is biased in favour of Androguard. // Already in discution Nevertheless, when using Androguard decompiler (DAD) to decompile an APK, it fails more than 50% of the time. @@ -50,9 +50,9 @@ is #mypercent(54.9, 100). When including the two defective tools, this ratio dr #highlight()[ *RQ1 answer:* -On a recent dataset we consider that \resultunusable of the tools are unusable. -For the tools that we could run, \resultratio of analysis are finishing successfully. -//(those with less than 50\% of successful execution and including the two tools that we were unable to build). +On a recent dataset we consider that #resultunusable of the tools are unusable. +For the tools that we could run, #resultratio of analysis are finishing successfully. +//(those with less than 50% of successful execution and including the two tools that we were unable to build). ] /* @@ -85,7 +85,8 @@ For the tools that we could run, \resultratio of analysis are finishing successf For investigating the effect of application dates on the tools, we computed the date of each APK based on the minimum date between the first upload in AndroZoo and the first analysis in VirusTotal. Such a computation is more reliable than using the dex date that is often obfuscated when packaging the application. Then, for the sake of clarity of our results, we separated the tools that have mainly Java source code from those that use other languages. -Among the ones that are Java based programs, most of them use the Soot framework which may correlate the obtained results. @fig:rasta-exit-evolution-java (resp. @fig:rasta-exit-evolution-not-java) compares the success rate of the tools between 2010 and 2023 for Java based tools (resp. non Java based tools). +Among the ones that are Java based programs, most of them use the Soot framework which may correlate the obtained results. +@fig:rasta-exit-evolution-java (resp. @fig:rasta-exit-evolution-not-java) compares the success rate of the tools between 2010 and 2023 for Java based tools (resp. non Java based tools). For Java based tools, a clear decrease of finishing rate can be observed globally for all tools. For non-Java based tools, 2 of them keep a high success rate (Androguard, Mallodroid). The result is expected for Androguard, because the analysis is relatively simple and the tool is largely adopted, as previously mentioned. diff --git a/3_rasta/4_discussion.typ b/3_rasta/4_discussion.typ index 2b8f0bd..8fd81bc 100644 --- a/3_rasta/4_discussion.typ +++ b/3_rasta/4_discussion.typ @@ -374,8 +374,8 @@ Our attempts to upgrade those dependencies led to new errors appearing: we concl Luo #etal released TaintBench@luoTaintBenchAutomaticRealworld2022 a real-world benchmark and the associated recommendations to build such a benchmark. These benchmarks confirmed that some tools such as Amandroid and Flowdroid are less efficient on real-world applications. -// Pauck {\it et al.}~\cite{pauckAndroidTaintAnalysis2018} -// Reaves {\it et al.}~\cite{reaves_droid_2016} +// Pauck #etal@pauckAndroidTaintAnalysis2018 +// Reaves #etal@reaves_droid_2016 We finally compare our results to the conclusions and discussions of previous papers@luoTaintBenchAutomaticRealworld2022 @pauckAndroidTaintAnalysis2018 @reaves_droid_2016. First we confirm the hypothesis of Luo #etal that real-world applications lead to less efficient analysis than using hand crafted test applications or old datasets@luoTaintBenchAutomaticRealworld2022.