wip

2025-07-29 16:23:42 +02:00 · 2025-07-29 16:23:42 +02:00 · c060e88996
commit c060e88996
parent 243b9df134
17 changed files with 264 additions and 96 deletions
--- a/0_preamble/acknowledgements.typ
+++ b/0_preamble/acknowledgements.typ
@ -1,7 +1,9 @@
 #import "../lib.typ": todo

+/*
 = Acknowledgements

 #todo[Acknowledge people]

 #text(fill: luma(75%), lorem(400))
+*/
--- a/1_introduction/main.typ
+++ b/1_introduction/main.typ
@ -10,7 +10,7 @@
 Android is the most used mobile operating system since 2014, and since 2017, it even surpasses Windows all platforms combined#footnote[https://gs.statcounter.com/os-market-share#monthly-200901-202304].
 The public adoption of Android is confirmed by application developers, with 1.3 millions apps available in the Google Play Store in 2014, and 3.5 millions apps available in 2017#footnote[https://www.statista.com/statistics/266210]. 
 Its popularity makes Android a prime target for malware developers. 
-For example, various applications have been shown to steal personal information@shanSelfhidingBehaviorAndroid2018.
+For example, various applications have been shown to steal personal information~@shanSelfhidingBehaviorAndroid2018.
 Consequently, Android has also been an important subject for security research. 

 /*
--- a/2_background/2_tools.typ
+++ b/2_background/2_tools.typ
@ -57,10 +57,10 @@ In addition to decompilling #DEX files, Jadx can also decode Android manifests a

 === Soot <sec:bg-soot>

-Soot#footnote[https://github.com/soot-oss/soot] @Arzt2013 is a Java optimization framework.
+Soot#footnote[https://github.com/soot-oss/soot]~@Arzt2013 is a Java optimization framework.
 It can leaft java bytecode to other intermediate representations that can be used to perform optimization then converted back to bytecode.
 Because Dalvik bytecode and Java bytecode are equivalent, support for Android was added to Soot, and Soot features are now leveraged to analyse Android applications.
-One of the best known example of Soot usage for Android analysis is Flowdroid@Arzt2014a, a tool that compute data flow in an application.
+One of the best known example of Soot usage for Android analysis is Flowdroid~@Arzt2014a, a tool that compute data flow in an application.

 A new version of Soot, SootUp#footnote[https://github.com/soot-oss/SootUp], is currently beeing worked on.
 Compared to Soot, it has a modernize interface and architecture, but it is not yet feature complete and some tools like Flowdroid are still using Soot.
--- a/2_background/3_analysis_techniques.typ
+++ b/2_background/3_analysis_techniques.typ
@ -6,7 +6,7 @@
 #todo[swap with tool section ?]

 In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications.
-Two main approaches can be distinguished: static and dynamic analysis@Li2017.
+Two main approaches can be distinguished: static and dynamic analysis~@Li2017.
 Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system.
 For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task.
 Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage.
@ -18,7 +18,7 @@ For malware, dynamic analysis is also limited by evading techniques that may pre
 Static analysis program examine an #APK file without executing it to extract information from it.
 Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code.

-More advance analysis consist in the computing the control-flow of an application and computing its data-flow@Li2017.
+More advance analysis consist in the computing the control-flow of an application and computing its data-flow~@Li2017.

 The most basic form of control-flow analysis is to build a call graph.
 A call graph is a graph where the nodes represent the methods in the application, and the edges reprensent calls from one method to another.
@ -121,7 +121,7 @@ This can be used to identify the user can cannot be changed if compromised.
 This make `TelephonyManager.getImei()` a good candidate as a taint source.
 On the other hand, `UrlRequest.start()` send a request to an external server, making it a taint sink.
 If a data-flow is found linking `TelephonyManager.getImei()` to `UrlRequest.start()`, this means the application is potentially leaking a critical information to an external entity, a behavior that is probably not wanted by the user.
-Data-flow analysis is the subject of many contribution@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable source being Flowdroid@Arzt2014a.
+Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable source being Flowdroid~@Arzt2014a.

 #todo[Describe the different contributions in relations to the issues they tackle]

@ -143,44 +143,31 @@ The most notable user of Soot is Flowdroid. #todo[formulation]

 The alternative to static analysis is dynamic analysis.
 With dynamic analysis, the application is actually executed.
-The most simple strategies consist in just running the application and examining its behavior. 
-For instance, Shao #etal #todo[cit] capture the network communication of an application and analyse those traces, while Bhatia #etal #todo[cit] take #jm-note[periodic][meh] snapshots of the memory to deduce the beavior of the application #todo[check the papers].
+The most simple strategies consist in just running the application and examining its behavior.
+For instance, Bernardi #etal~@bernardi_dynamic_2019 use the log generated by `strace` to list the system calls generated in responce to an event to determine if an application is malicious.

 More advanced methods are more intrusive and require modifing either the #APK, the Android framework, runtime, or kernel.
-TaintDroid #todo[cit] for example modify the Dalvik Virtual Machine (the predecessor of the #ART) to track the data flow of an application at runtime, while AndroBlare #todo[cit] try to compute the taint flow by hooking system calls from a kernel module. #todo[check papers]
-#todo[RealDroid?]
-
+TaintDroid~@Enck2010 for example modify the Dalvik Virtual Machine (the predecessor of the #ART) to track the data flow of an application at runtime, while AndroBlare~@Andriatsimandefitra2012 @andriatsimandefitra_detection_2015 try to compute the taint flow by hooking system calls using a Linux Security Module.
+DexHunter~@zhang2015dexhunter and AppSpear~@yang_appspear_2015 also patch the Dalvik Virtual Machine/#ART, this time to collect bytecode loaded dynamically.
 Modifying the Android framwork, runtime or kernel is possible thanks to the Android project beeing opensource, however this is delicate operation.
 Thus, a common issue faced by tools that took this approach is that they are stuck with a specific version of Android.
-DroidScope@droidscope180237 and CopperDroid@Tam2015 are two well known sandbox faced with this issue. #todo[check, and add android version]
-To limit this problem, other sandbox focus on hooking strategies, like DroidHook and Mirage #todo[cit, check paper], based on the Xposed framework, and CamoDroid #todo[cit and check], based on Frida.
+Some sandboxes limit this issue by using dynamic binary instrumentation, like DroidHook~@cui_droidhook_2023, based the Xposed framework, or CamoDroid~@faghihi_camodroid_2022, based on Frida.

 Another known challenge when analysing an application dynamically is the code coverage: if some part of the application is not executed, it cannot be annalysed.
 Considering that Android applications are meant to interact with a user, this can become problematic for automatic analysis.
-#todo[runner considered]
-GroddDroid use static analysis to use static analysis to find suspicious code section and then use this information to guide a runner that uses the #todo[whatisnameagain?] framework to triger those suspicious section of code.
-More challenging, some application will try to detect is they are in a sandbox environnement (#eg if they are in an emmulator, or if Frida is present in memory) and will refuse to run some sections of code if this is the case.
-#todo[name] #etal @ruggia_unmasking_2024 make a list of evation techniques.
-They show that most current analysis framework failled to hide themself correctly and introduce a new sandbox, DroidDungeon, that do avoid detection. #todo[limitation?]
+The Monkey tool developed by Google is one of the most used solution~@sutter_dynamic_2024.
+It sends a random streams of events the phone without tracking the state of the application.
+More advance tools statically analyse the application to model in order to improve the exploration.
+Sapienz~@mao_sapienz_2016 and Stoat~@su_guided_2017 uses this technique to improve application testing.
+GroddDroid~@abraham_grodddroid_2015 has the same approach but detect statically suspicious sections of code to target, and will interact with the application to target those code section.
+
+Unfortuntely, exploring the application entirely is not always possible, as some applications will try to detect is they are in a sandbox environnement (#eg if they are in an emmulator, or if Frida is present in memory) and will refuse to run some sections of code if this is the case.
+Ruggia #etal~@ruggia_unmasking_2024 make a list of evasion techniques.
+They propose a new sandbox, DroidDungeon, that contrary to other sandboxes like DroidScope@droidscope180237 or CopperDroid@Tam2015, strongly emphasizes on resiliance against evasion mechanism.
+#todo[RealDroid sandbox bases on modified ART?]
 #todo[force execution?]

-// Shao et al. Yuru Shao, Jason Ott, Yunhan Jack Jia, Zhiyun Qian, and Z Morley Mao. ‘The Misuse of Android Unix Domain Sockets and Security Implications’. In: ACM SIGSAC Conference on Computer and Communications Security. Vienna, Austria: ACM, Oct. 2016, pp. 80–91.
-// Bhatia et al. Rohit Bhatia, Brendan Saltaformaggio, Seung Jei Yang, Aisha Ali-Gombe, Xiangyu Zhang, Dongyan Xu, and Golden G Richard III. ‘"Tipped Off by Your Memory Allocator": Device-Wide User Activity Sequencing from Android Memory Images’. In: (Feb. 2018).
-
- #todo[evasion: droid DroidDungeon @ruggia_unmasking_2024]
- #todo[Xposed: DroidHook / Mirage: Toward a stealthier and modular malware analysis sandbox for android]
- #todo[Frida: CamoDroid]
- #todo[
-  modified android framework, framework or kernel: 
-  - RealDroid
-  - AndroBlare, taint analysis, linux module to hook syscalls, c'est maison 
-  Radoniaina Andriatsimandefitra and Valérie Viet Triem Tong. ‘Detection and identification of Android malware based on information flow monitoring’. In: 2nd International Conference on Cyber Security and
-Cloud Computing. New York, USA: IEEE, Jan. 2015, pp. 200–203.
-  Radoniaina Andriatsimandefitra, Stéphane Geller, and Valérie Viet Triem Tong. ‘Designing information flow policies for Android’s operating system’. In: IEEE International conference on communications.Ottawa, ON, Canada: IEEE, June 2012, pp. 976–981.
-  - TaintDroid (check if dynamic? strange, cf Reaves et al)  modifies the Dalvik Virtual Machine (DVM) interpreter to manage taint 
-]
-
 === Hybrid Analysis <sec:bg-hybrid>
 #todo[merge with other section?]

- #todo[DyDroid, audit of Dynamic Code Loading@qu_dydroid_2017]
+- #todo[DyDroid, audit of Dynamic Code Loading~@qu_dydroid_2017]
--- a/2_background/4_datasets.typ
+++ b/2_background/4_datasets.typ
@ -4,19 +4,19 @@

 Computing if an application contains a possible information flow is an example of a static analysis goal.
 Some datasets have been built especially for evaluating tools that are computing information flows inside Android applications. 
-One of the first well known dataset is DroidBench, that was released with the tool Flowdroid@Arzt2014a. 
-Later, the dataset ICC-Bench was introduced with the tool Amandroid@weiAmandroidPreciseGeneral2014 to complement DroidBench by introducing applications using Inter-Component data flows. 
+One of the first well known dataset is DroidBench, that was released with the tool Flowdroid~@Arzt2014a. 
+Later, the dataset ICC-Bench was introduced with the tool Amandroid~@weiAmandroidPreciseGeneral2014 to complement DroidBench by introducing applications using Inter-Component data flows. 
 These datasets contain carefully crafted applications containing flows that the tools should be able to detect. 
 These hand-crafted applications can also be used for testing purposes or to detect any regression when the software code evolves. 
 Contrary to real world applications, the behavior of these hand-crafted applications is known in advance, thus providing the ground truth that the tools try to compute.  
-However, these datasets are not representative of real-world applications@Pendlebury2018 and the obtained results can be misleading.
+However, these datasets are not representative of real-world applications~@Pendlebury2018 and the obtained results can be misleading.

 Contrary to DroidBench and ICC-Bench, some approaches use real-world applications.
-Bosu #etal@bosuCollusiveDataLeak2017 use DIALDroid to perform a threat analysis of Inter-Application communication and published DIALDroid-Bench, an associated dataset.
-Similarly, Luo #etal released TaintBench@luoTaintBenchAutomaticRealworld2022 a real-world dataset and the  associated recommendations to build such a dataset.
+Bosu #etal~@bosuCollusiveDataLeak2017 use DIALDroid to perform a threat analysis of Inter-Application communication and published DIALDroid-Bench, an associated dataset.
+Similarly, Luo #etal released TaintBench~@luoTaintBenchAutomaticRealworld2022 a real-world dataset and the  associated recommendations to build such a dataset.
 These datasets are useful for carefully spotting missing taint flows, but contain only a few dozen of applications. 

-In addition to those datasets, Androzoo@allixAndroZooCollectingMillions2016 collect applications from several application market places, including the Google Play store (the official Google application store),  Anzhi and AppChina (two chinese stores), or FDroid (a store dedicated to free and open source applications).
+In addition to those datasets, Androzoo~@allixAndroZooCollectingMillions2016 collect applications from several application market places, including the Google Play store (the official Google application store),  Anzhi and AppChina (two chinese stores), or FDroid (a store dedicated to free and open source applications).
 Currently, Androzoo contains more than 25 millions applications, that can be downloaded by researchers from the SHA256 hash of the application.
 Androzoo provide additionnal information about the applications, like the date the application was detected for the first time by Androzoo or the number of antivirus from VirusTotal that flaged the application as malicious.
 In addition to providing researchers with an easy access to real world applications, Androzoo make it a lot easier to share datasets for reproducibility: instead of sharing hundreds of #APK files, the list of SHA256 is enough.
--- a/3_rasta/0_intro.typ
+++ b/3_rasta/0_intro.typ
@ -9,10 +9,10 @@ On the contrary, we take as hypothesis that the provided tools compute the inten
 This chapter intends to show that sharing the software artifacts of a paper may not be sufficient to ensure that the provided software would be reusable. 

 Thus, our contributions are the following. 
-We carefully retrieved static analysis tools for Android applications that were selected by Li #etal@Li2017 between 2011 and 2017. 
+We carefully retrieved static analysis tools for Android applications that were selected by Li #etal~@Li2017 between 2011 and 2017. 
 We contacted the authors, whenever possible, for selecting the best candidate versions and to confirm the good usage of the tools.
 We rebuild the tools in their original environment and we plan to share our Docker images with this paper.
-We evaluated the reusability of the tools by measuring the number of successful analysis of applications taken in the Drebin dataset@Arp2014 and in a custom dataset that contains more recent applications (#NBTOTALSTRING in total). 
+We evaluated the reusability of the tools by measuring the number of successful analysis of applications taken in the Drebin dataset~@Arp2014 and in a custom dataset that contains more recent applications (#NBTOTALSTRING in total). 
 The observation of the success or failure of these analysis enables us to answer the following research questions: 

 / RQ1: What Android static analysis tools that are more than 5 years old are still available and can be reused without crashing with a reasonable effort?
--- a/3_rasta/1_related_work.typ
+++ b/3_rasta/1_related_work.typ
@ -12,27 +12,27 @@
 We review in this section the past existing contributions related to static analysis tools reusability.

 Several papers have reviewed Android analysis tools produced by researchers. 
-Li #etal@Li2017 published a systematic literature review for Android static analysis before May 2015. 
+Li #etal~@Li2017 published a systematic literature review for Android static analysis before May 2015. 
 They analyzed 92 publications and classified them by goal, method used to solve the problem and underlying technical solution for handling the bytecode when performing the static analysis.  
 In particular, they listed 27 approaches with an open-source implementation available. 
 Nevertheless, experiments to evaluate the reusability of the pointed out software were not performed. 
 We believe that the effort of reviewing the literature for making a comprehensive overview of available approaches should be pushed further: an existing published approach with a software that cannot be used for technical reasons endanger both the reproducibility and reusability of research.

 As we saw in @sec:bg-datasets that the need for a ground truth to test analysis tools leads test datasets to often be handcrafted.
-The few datasets composed of real-world application confirmed that some tools such as Amandroid@weiAmandroidPreciseGeneral2014 and Flowdroid@Arzt2014a are less efficient on real-world applications@bosuCollusiveDataLeak2017 @luoTaintBenchAutomaticRealworld2022.
+The few datasets composed of real-world application confirmed that some tools such as Amandroid~@weiAmandroidPreciseGeneral2014 and Flowdroid~@Arzt2014a are less efficient on real-world applications~@bosuCollusiveDataLeak2017 @luoTaintBenchAutomaticRealworld2022.
 Unfortunatly, those real-world applications datasets are rather small, and a larger number of applications would be more suitable for our goal, #ie evaluating the reusability of a variety of static analysis tools.

-Pauck #etal@pauckAndroidTaintAnalysis2018 used DroidBench@@Arzt2014a, ICC-Bench@weiAmandroidPreciseGeneral2014 and DIALDroid-Bench@@bosuCollusiveDataLeak2017 to compare Amandroid@weiAmandroidPreciseGeneral2014, DIAL-Droid@bosuCollusiveDataLeak2017, DidFail@klieberAndroidTaintFlow2014, DroidSafe@DBLPconfndssGordonKPGNR15, FlowDroid@Arzt2014a and IccTA@liIccTADetectingInterComponent2015 -- all these tools will be also compared in this chapter. 
+Pauck #etal~@pauckAndroidTaintAnalysis2018 used DroidBench~@Arzt2014a, ICC-Bench~@weiAmandroidPreciseGeneral2014 and DIALDroid-Bench~@bosuCollusiveDataLeak2017 to compare Amandroid~@weiAmandroidPreciseGeneral2014, DIAL-Droid~@bosuCollusiveDataLeak2017, DidFail~@klieberAndroidTaintFlow2014, DroidSafe~@DBLPconfndssGordonKPGNR15, FlowDroid~@Arzt2014a and IccTA~@liIccTADetectingInterComponent2015 -- all these tools will be also compared in this chapter. 
 To perform their comparison, they introduced the AQL (Android App Analysis Query Language) format. 
 AQL can be used as a common language to describe the computed taint flow as well as the expected result for the datasets. 
-It is interesting to notice that all the tested tools timed out at least once on real-world applications, and that Amandroid@weiAmandroidPreciseGeneral2014, DidFail@klieberAndroidTaintFlow2014, DroidSafe@DBLPconfndssGordonKPGNR15, IccTA@liIccTADetectingInterComponent2015 and ApkCombiner@liApkCombinerCombiningMultiple2015 (a tool used to combine applications) all failed to run on applications built for Android API 26. 
+It is interesting to notice that all the tested tools timed out at least once on real-world applications, and that Amandroid~@weiAmandroidPreciseGeneral2014, DidFail~@klieberAndroidTaintFlow2014, DroidSafe~@DBLPconfndssGordonKPGNR15, IccTA~@liIccTADetectingInterComponent2015 and ApkCombiner~@liApkCombinerCombiningMultiple2015 (a tool used to combine applications) all failed to run on applications built for Android API 26. 
 These results suggest that a more thorough study of the link between application characteristics (#eg date, size) should be conducted. 
-Luo #etal@luoTaintBenchAutomaticRealworld2022 used the framework introduced by Pauck #etal to compare Amandroid@weiAmandroidPreciseGeneral2014 and Flowdroid@Arzt2014a on DroidBench and their own dataset TaintBench, composed of real-world android malware. 
+Luo #etal~@luoTaintBenchAutomaticRealworld2022 used the framework introduced by Pauck #etal to compare Amandroid~@weiAmandroidPreciseGeneral2014 and Flowdroid~@Arzt2014a on DroidBench and their own dataset TaintBench, composed of real-world android malware. 
 They found out that those tools have a low recall on real-world malware, and are thus over adapted to micro-datasets. 
 Unfortunately, because AQL is only focused on taint flows, we cannot use it to evaluate tools performing more generic analysis.

-A first work about quantifying the reusability of static analysis tools was proposed by Reaves #etal@reaves_droid_2016.
-Seven Android analysis tools (Amandroid@weiAmandroidPreciseGeneral2014, AppAudit@xiaEffectiveRealTimeAndroid2015, DroidSafe@DBLPconfndssGordonKPGNR15, Epicc@octeau2013effective, FlowDroid@Arzt2014a, MalloDroid@fahlWhyEveMallory2012 and TaintDroid@Enck2010) were selected to check if they were still readily usable. 
+A first work about quantifying the reusability of static analysis tools was proposed by Reaves #etal~@reaves_droid_2016.
+Seven Android analysis tools (Amandroid~@weiAmandroidPreciseGeneral2014, AppAudit~@xiaEffectiveRealTimeAndroid2015, DroidSafe~@DBLPconfndssGordonKPGNR15, Epicc~@octeau2013effective, FlowDroid~@Arzt2014a, MalloDroid~@fahlWhyEveMallory2012 and TaintDroid~@Enck2010) were selected to check if they were still readily usable. 
 For each tool, both the usability and results of the tool were evaluated by asking auditors to install and use it on DroidBench and 16 real world applications. 
 The auditors reported that most of the tools require a significant amount of time to setup, often due to dependencies issues and operating system incompatibilities. 
 Reaves #etal propose to solve these issues by distributing a Virtual Machine with a functional build of the tool in addition to the source code. 
@ -41,7 +41,7 @@ Reaves #etal also report that real world applications are more challenging to an
 We will confirm and expand this result in this chapter with a larger dataset than only 16 real-world applications. 
 // Indeed, a more diverse dataset would assess the results and give more insight about the factors impacting the performances of the tools.

-Finally, our approach is similar to the methodology employed by Mauthe #etal for decompilers@mauthe_large-scale_2021. 
+Finally, our approach is similar to the methodology employed by Mauthe #etal for decompilers~@mauthe_large-scale_2021. 
 To assess the robustness of android decompilers, Mauthe #etal used 4 decompilers on a dataset of 40 000 applications. 
 The error messages of the decompilers were parsed to list the methods that failed to decompile, and this information was used to estimate the main causes of failure.
 It was found that the failure rate is correlated to the size of the method, and that a consequent amount of failure are from third parties library rather than the core code of the application.
--- a/3_rasta/2_methodology.typ
+++ b/3_rasta/2_methodology.typ
@ -66,18 +66,18 @@
    *documentation*: #okk: excellent, MWE, #ok: few inconsistencies, #bad: bad quality, #ko: not available\
    *decision*: #ok: considered; #bad: considered but not built; #ko: out of scope of the study
  ]},
-  caption: [Considered tools@Li2017: availability and usage reliability],
+  caption: [Considered tools~@Li2017: availability and usage reliability],
 ) <tab:rasta-tools>

-We collected the static analysis tools from@Li2017, plus one additional paper encountered during our review of the state-of-the-art (DidFail@klieberAndroidTaintFlow2014). 
+We collected the static analysis tools from~@Li2017, plus one additional paper encountered during our review of the state-of-the-art (DidFail~@klieberAndroidTaintFlow2014). 
 They are listed in @tab:rasta-tools, with the original release date and associated paper. 
-We intentionally limited the collected tools to the ones selected by Li #etal@Li2017 for several reasons.
+We intentionally limited the collected tools to the ones selected by Li #etal~@Li2017 for several reasons.
 First, not using recent tools enables to have a gap of at least 5 years between the publication and the more recent APK files, which enables to measure the reusability of previous contribution with a reasonable gap of time. 
-Second, collecting new tools would require to describe these tools in depth, similarly to what have been performed by Li #etal@Li2017, which is not the primary goal of this paper. 
+Second, collecting new tools would require to describe these tools in depth, similarly to what have been performed by Li #etal~@Li2017, which is not the primary goal of this paper. 
 Additionally, selection criteria such as the publication venue or number of citations would be necessary to select a subset of tools, which would require an additional methodology. 
 These possible contributions are left for future work.

-Some tools use hybrid analysis (both static and dynamic): A3E@DBLPconfoopslaAzimN13, A5@vidasA5AutomatedAnalysis2014, Android-app-analysis@geneiatakisPermissionVerificationApproach2015, StaDynA@zhauniarovichStaDynAAddressingProblem2015. 
+Some tools use hybrid analysis (both static and dynamic): A3E~@DBLPconfoopslaAzimN13, A5~@vidasA5AutomatedAnalysis2014, Android-app-analysis~@geneiatakisPermissionVerificationApproach2015, StaDynA~@zhauniarovichStaDynAAddressingProblem2015. 
 They have been excluded from this paper. 
 We manually searched the tool repository when the website mentioned in the paper is no longer available (#eg when the repository have been migrated from Google code to GitHub) and for each tool we searched for:

@ -89,7 +89,7 @@ In @tab:rasta-tools we rated the quality of these artifacts with "#ok" when avai
 Results show that documentation is often missing or very poor (#eg Lotrack), which makes the rebuild process very complex and the first analysis of a MWE.  


-We finally excluded Choi #etal@CHOI2014620 as their tool works on the sources of Android applications, and Poeplau #etal@DBLPconfndssPoeplauFBKV14 that focus on Android hardening. 
+We finally excluded Choi #etal~@CHOI2014620 as their tool works on the sources of Android applications, and Poeplau #etal~@DBLPconfndssPoeplauFBKV14 that focus on Android hardening. 
 As a summary, in the end we have #nbtoolsselected tools to compare. 
 Some specificities should be noted. 
 The IC3 tool will be duplicated in our experiments because two versions are available: the original version of the authors and a fork used by other tools like IccTa.   
@ -255,11 +255,11 @@ Probleme 2: pour sampler, on utilise les deciles de taille d'apk, mais pour nos
 */

 Two datasets are used in the experiments of this section. 
-The first one is *Drebin*@Arp2014, from which we extracted the malware part (5479 samples that we could retrieved) for comparison purpose only. 
-It is a well known and very old dataset that should not be used anymore because it contains temporal and spatial biases@Pendlebury2018. 
+The first one is *Drebin*~@Arp2014, from which we extracted the malware part (5479 samples that we could retrieved) for comparison purpose only. 
+It is a well known and very old dataset that should not be used anymore because it contains temporal and spatial biases~@Pendlebury2018. 
 We intend to compare the rate of success on this old dataset with a more recent one. 
 The second one, *Rasta*, we built to cover all dates between 2010 to 2023. 
-This dataset is a random extract of Androzoo@allixAndroZooCollectingMillions2016, for which we balanced applications between years and size. 
+This dataset is a random extract of Androzoo~@allixAndroZooCollectingMillions2016, for which we balanced applications between years and size. 
 For each year and inter-decile range of size in Androzoo, 500 applications have been extracted with an arbitrary proportion of 7% of malware. 
 This ratio has been chosen because it is the ratio of goodware/malware that we observed when performing a raw extract of Androzoo.
 For checking the maliciousness of an Android application we rely on the VirusTotal detection indicators. 
--- a/3_rasta/3_experiments.typ
+++ b/3_rasta/3_experiments.typ
@ -24,7 +24,7 @@
 They represent  the success/failure rate (green/orange) of the tools. 
 We distinguished failure to compute a result from timeout (blue) and crashes of our evaluation framework (in grey, probably due to out of memory kills of the container itself). 
 Because it may be caused by a bug in our own analysis stack, exit status represented in grey (Other) are considered as unknown errors and not as failure of the tool.
-#todo[We discuss further errors for which we have information in the logs in /*@*/sec:rasta-failure-analysis.]
+#todo[We discuss further errors for which we have information in the logs in @sec:rasta-failure-analysis.]

 Results on the Drebin datasets shows that 11 tools have a high success rate (greater than 85%). 
 The other tools have poor results. 
--- a/3_rasta/4_discussion.typ
+++ b/3_rasta/4_discussion.typ
@ -331,13 +331,13 @@ Our attempts to upgrade those dependencies led to new errors appearing: we concl

 === State of the art comparison

-Luo #etal released TaintBench@luoTaintBenchAutomaticRealworld2022 a real-world benchmark and the  associated recommendations to build such a benchmark. 
+Luo #etal released TaintBench~@luoTaintBenchAutomaticRealworld2022 a real-world benchmark and the  associated recommendations to build such a benchmark. 
 These benchmarks confirmed that some tools such as Amandroid and Flowdroid are less efficient on real-world applications.
 // Pauck #etal@pauckAndroidTaintAnalysis2018
 // Reaves #etal@reaves_droid_2016

-We finally compare our results to the conclusions and discussions of previous papers@luoTaintBenchAutomaticRealworld2022 @pauckAndroidTaintAnalysis2018 @reaves_droid_2016.
-First we confirm the hypothesis of Luo #etal that real-world applications lead to less efficient analysis than using hand crafted test applications or old datasets@luoTaintBenchAutomaticRealworld2022. 
+We finally compare our results to the conclusions and discussions of previous papers~@luoTaintBenchAutomaticRealworld2022 @pauckAndroidTaintAnalysis2018 @reaves_droid_2016.
+First we confirm the hypothesis of Luo #etal that real-world applications lead to less efficient analysis than using hand crafted test applications or old datasets~@luoTaintBenchAutomaticRealworld2022. 
 Even if Drebin is not hand-crafted, it is quite old and we obtained really good results compared to the Rasta dataset. 
 When considering real-world applications, the size is rather different from hand crafted application, which impacts the success rate. 
 We believe that it is explained by the fact that the complexity of the code increases with its size.
@ -354,10 +354,10 @@ We believe that it is explained by the fact that the complexity of the code incr

 === State-of-the-art comparison

-Our finding are consistent with the numerical results of Pauck #etal that showed that #mypercent(106, 180) of DIALDroid-Bench@bosuCollusiveDataLeak2017 real-world applications are analyzed successfully with the 6 evaluated tools@pauckAndroidTaintAnalysis2018.
+Our finding are consistent with the numerical results of Pauck #etal that showed that #mypercent(106, 180) of DIALDroid-Bench~@bosuCollusiveDataLeak2017 real-world applications are analyzed successfully with the 6 evaluated tools~@pauckAndroidTaintAnalysis2018.
 Six years after the release of DIALDroid-Bench, we obtain a lower ratio of #mypercent(40.05, 100) for the same set of 6 tools but using the Rasta dataset of #NBTOTALSTRING applications. 
 We extended this result to a set of #nbtoolsvariationsrun tools and obtained a global success rate of #resultratio. 
-We confirmed that most tools require a significant amount of work to get them running@reaves_droid_2016. 
+We confirmed that most tools require a significant amount of work to get them running~@reaves_droid_2016. 
 Our investigations of crashes also confirmed that dependencies to older versions of Apktool are impacting the performances of Anadroid, Saaf and Wognsen #etal in addition to DroidSafe and IccTa, already identified by Pauck #etal.

 /*
--- a/3_rasta/5_conclusion.typ
+++ b/3_rasta/5_conclusion.typ
@ -3,7 +3,7 @@

 == Conclusion <sec:rasta-conclusion>

-This paper has assessed the suggested results of the literature@luoTaintBenchAutomaticRealworld2022 @pauckAndroidTaintAnalysis2018 @reaves_droid_2016 about the reliability of static analysis tools for Android applications. 
+This paper has assessed the suggested results of the literature~@luoTaintBenchAutomaticRealworld2022 @pauckAndroidTaintAnalysis2018 @reaves_droid_2016 about the reliability of static analysis tools for Android applications. 
 With a dataset of #NBTOTALSTRING applications we established that #resultunusable of #nbtoolsselectedvariations tools are not reusable, when considering that a tool that has more than 50% of time a failure is unusable. 
 In total, the analysis success rate of the tools that we could run for the entire dataset is #resultratio. 
 The characteristics that have the most influence on the success rate is the bytecode size and min SDK version. Finally, we showed that malware APKs have a better finishing rate than goodware.
@ -11,4 +11,4 @@ The characteristics that have the most influence on the success rate is the byte
 In future works, we plan to investigate deeper the reported errors of the tools in order to analyze the most common types of errors, in particular for Java based tools. 
 We also plan to extend this work with a selection of more recent tools performing static analysis.

-Following Reaves #etal recommendations@reaves_droid_2016, we publish the Docker and Singularity images we built to run our experiments alongside the Docker files. This will allow the research community to use directly the tools without the build and installation penalty.
+Following Reaves #etal recommendations~@reaves_droid_2016, we publish the Docker and Singularity images we built to run our experiments alongside the Docker files. This will allow the research community to use directly the tools without the build and installation penalty.
--- a/4_class_loader/0_intro.typ
+++ b/4_class_loader/0_intro.typ
@ -17,7 +17,7 @@ For such a task, some automated analysis is performed, but sometimes, a manual i
 A reverser is in charge of studying the application: they usually perform a static analysis and a dynamic analysis. 
 The reverser uses in the first phase static analysis tools in order to access and review the code of the application. 
 If this first phase is not accurately driven, for example if they fail to access a critical class, they may decide that a malicious application is safe. 
-Additionally, as stated by Li #etal@Li2017 in their conclusions, such a task is complexified by dynamic code loading, reflective calls, native code, and multi-threading which cannot be easily handled statically. 
+Additionally, as stated by Li #etal~@Li2017 in their conclusions, such a task is complexified by dynamic code loading, reflective calls, native code, and multi-threading which cannot be easily handled statically. 
 Nevertheless, even if we do not consider these aspects, determining statically how the regular class loading system of Android is working is a difficult task.

 Class loading occurs at runtime and is handled by the components of #ART, even when the application is partially or fully  compiled ahead of time. 
@ -32,8 +32,8 @@ As a consequence, it is frequent to find inside applications some classes that c
 At runtime each smartphone runs a unique version of Android, but, as the application is deployed on multiple versions of Android, it is difficult to predict which classes will be loaded from the #Asdkc or from the APK file itself. 
 This complexity increases with the multi-#DEX format of recent #APK files that can contain several bytecode files. 

-Going back to the problem of a reverser studying a suspicious application statically, the reverser uses tools to disassemble the application@mauthe_large-scale_2021 and track the flows of data in the bytecode. 
-As an example, for a spyware potentially leaking personal information, the reverser can unpack the application with Apktool and, after manually locating a method that they suspect to read sensitive data (by reading the unpacked bytecode), they can compute with FlowDroid@Arzt2014a if there is a flow from this method to methods performing HTTP requests. 
+Going back to the problem of a reverser studying a suspicious application statically, the reverser uses tools to disassemble the application~@mauthe_large-scale_2021 and track the flows of data in the bytecode. 
+As an example, for a spyware potentially leaking personal information, the reverser can unpack the application with Apktool and, after manually locating a method that they suspect to read sensitive data (by reading the unpacked bytecode), they can compute with FlowDroid~@Arzt2014a if there is a flow from this method to methods performing HTTP requests. 
 During these steps, the reverser faces the problem of resolving statically, which class is loaded from the APK file and the #Asdkc. 
 If they, or the tools they use, choose the wrong version of the class, they may obtain wrong conclusions about the code. 
 Thus, the possibility of shadowing classes could be exploited by an attacker in order to obfuscate the code.
@ -41,12 +41,12 @@ Thus, the possibility of shadowing classes could be exploited by an attacker in
 In this paper, we study how Android handles the loading of classes in the case of multiple versions of the same class. 
 Such collision can exist inside the APK file or between the APK file and #Asdkc. 
 We intend to understand if a reverser would be impacted during a static analysis when dealing with such an obfuscated code. 
-Because this problem is already enough complex with the current operations performed by Android, we exclude the case where a developer recodes a specific class loader or replace a class loader by another one, as it is often the case for example in packed applications@Duan2018. 
+Because this problem is already enough complex with the current operations performed by Android, we exclude the case where a developer recodes a specific class loader or replace a class loader by another one, as it is often the case for example in packed applications~@Duan2018. 
 We present a new technique that "shadows" a class #ie  embeds a class in the APK file and "presents" it to the reverser instead of the legitimate version. 
 The goal of such an attack is to confuse them during the reversing process: at runtime the real class will be loaded from another location of the APK file or from the #Asdk, instead of the shadow version. 
-This attack can be applied to regular classes of the #Asdk or to hidden classes of Android@he_systematic_2023 @li_accessing_2016. 
+This attack can be applied to regular classes of the #Asdk or to hidden classes of Android~@he_systematic_2023 @li_accessing_2016. 
 We show how these attacks can confuse the tools of the reverser when he performs a static analysis. 
-In order to evaluate if such attacks are already used in the wild, we analyzed #nbapk applications from 2023 that we extracted randomly from AndroZoo@allixAndroZooCollectingMillions2016. 
+In order to evaluate if such attacks are already used in the wild, we analyzed #nbapk applications from 2023 that we extracted randomly from AndroZoo~@allixAndroZooCollectingMillions2016. 
 Our main result is that  #shadowsdk of these applications contain shadow collisions against the #SDK and #shadowhidden against hidden classes. 
 Our investigations conclude that most of these collisions are not voluntary attacks, but we highlight one specific malware sample performing strong obfuscation revealed by our detection of one shadow attack.

--- a/4_class_loader/1_related_work.typ
+++ b/4_class_loader/1_related_work.typ
@ -5,38 +5,38 @@

 #paragraph([Class loading])[
 Class loading mechanisms  have been studied in the general context of the Java language. 
-Gong@gong_secure_1998 describes the JDK 1.2 class loading architecture and capabilities. 
+Gong~@gong_secure_1998 describes the JDK 1.2 class loading architecture and capabilities. 
 One of the main advantages of class loading is the type safety property that prevents type spoofing. 
-As explained by Liang and Bracha@liang_dynamic_1998, by capturing events at runtime (new loaders, new class) and maintaining constraints on the multiple loaders and their delegation hierarchy, authors can avoid confusion when loading a spoofed class. 
+As explained by Liang and Bracha~@liang_dynamic_1998, by capturing events at runtime (new loaders, new class) and maintaining constraints on the multiple loaders and their delegation hierarchy, authors can avoid confusion when loading a spoofed class. 
 This behavior is now implemented in modern Java virtual machines.
-Later Tazawa and Hagiya@tozawa_formalization_2002 proposed a formalization of the Java Virtual Machine supporting dynamic class loading in order to ensure type safety. 
+Later Tazawa and Hagiya~@tozawa_formalization_2002 proposed a formalization of the Java Virtual Machine supporting dynamic class loading in order to ensure type safety. 
 Those works ensure strong safety for the Java Virtual Machine, in particular when linking new classes at runtime. 
 Although Android has a similar mechanism, the implementation is not shared with the JVM of Oracle.
 Additionally, in this paper, we do not focus on spoofing classes at runtime, but on confusion that occurs when using a static analyzer used by a reverser that tries to understand the code loading process offline.

 Contributions about Android class loading focus on using the capabilities of class loading to extend Android features or to prevent reverse engineering of Android applications. 
-For instance, Zhou #etal@zhou_dynamic_2022 extend the class loading mechanism of Android to support regular Java bytecode and Kritz and Maly@kriz_provisioning_2015 propose a new class loader to automatically load modules of an application without user interactions. 
+For instance, Zhou #etal~@zhou_dynamic_2022 extend the class loading mechanism of Android to support regular Java bytecode and Kritz and Maly~@kriz_provisioning_2015 propose a new class loader to automatically load modules of an application without user interactions. 

-Regarding reverse engineering, class loading mechanisms are frequently used by packers for hiding all or parts of the code of an application@Duan2018. 
+Regarding reverse engineering, class loading mechanisms are frequently used by packers for hiding all or parts of the code of an application~@Duan2018. 
 The problem to be solved consists in locating secondary #dexfiles that can be unciphered just before being loaded. 
 Dynamic hook mechanisms should be used to intercept the bytecode at load time. 
 These techniques can be of some help for the reverser, but they require to instrument the source code of AOSP or the application itself. 
 The engineering cost is high and anti-debugging techniques can slow down the process. 
-Thus, a reverser always starts by studying statically an application using static analysis tools@Li2017, and will eventually go to dynamic analysis@Egele2012 if further costly extra analysis is needed (for example, if they spot the use of a custom class loader). 
+Thus, a reverser always starts by studying statically an application using static analysis tools~@Li2017, and will eventually go to dynamic analysis~@Egele2012 if further costly extra analysis is needed (for example, if they spot the use of a custom class loader). 
 Performing a static analysis of an application can be time consuming if the programmer uses obfuscation techniques such as native code, packing techniques, value encryption, or reflection. 
 Such techniques can partially hide the Java bytecode from a static analysis investigation as they modify it at runtime. 
 For example, packers exploits the class loading capability of Android to load new code. 
-They also combine the loading with code generation from ciphered assets or code modification from native code calls@liao2016automated to increase the difficulty of recovery of the code. 
-Because parts of the original code will be only available at runtime, deobfuscation approaches propose techniques that track #DEX structures when manipulated by the application@zhang2015dexhunter @xue2017adaptive @wong2018tackling. All those contributions are directly related to the class loading mechanism of Android.
+They also combine the loading with code generation from ciphered assets or code modification from native code calls~@liao2016automated to increase the difficulty of recovery of the code. 
+Because parts of the original code will be only available at runtime, deobfuscation approaches propose techniques that track #DEX structures when manipulated by the application~@zhang2015dexhunter @xue2017adaptive @wong2018tackling. All those contributions are directly related to the class loading mechanism of Android.

 Deobfuscating an application is the first problem the reverse engineer has to solve. Nevertheless, even, if all classes of the code are recovered by the reverse engineer, understanding what are the classes that are really loaded by Android brings an additional problem. 
 The reverse engineer can have the feeling that what he sees in the bytecode is what is loaded at runtime, whereas the system can choose alternative implementations of a class.
-Our goal is to show that tools mentioned in the literature@Li2017 can suffer from attacks exploiting confusion inside regular class loading mechanisms of Android.
+Our goal is to show that tools mentioned in the literature~@Li2017 can suffer from attacks exploiting confusion inside regular class loading mechanisms of Android.
 ]

 #paragraph([Hidden APIs])[
-Li #etal did an empirical study of the usage and evolution of hidden APIs@li_accessing_2016. 
+Li #etal did an empirical study of the usage and evolution of hidden APIs~@li_accessing_2016. 
 They found that hidden APIs are added and removed in every release of Android, and that they are used both by benign and malicious applications. 
-More recently, He #etal @he_systematic_2023 did a systematic study of hidden service API related to security. 
+More recently, He #etal~@he_systematic_2023 did a systematic study of hidden service API related to security. 
 They studied how the hidden API can be used to bypass Android security restrictions and found that although Google countermeasures are effective, they need to be implemented inside the system services and not the hidden API due to the lack of in-app privilege isolation: the framework code is in the same process as the user code, meaning any restriction in the framework can be bypassed by the user.
 ]
--- a/4_class_loader/2_classloading.typ
+++ b/4_class_loader/2_classloading.typ
@ -51,7 +51,7 @@ When used directly by ART, the classes are usually stored in an application file

 The order in which classes are loaded at runtime requires special attention. 
 All the specific Android class loaders (`DexClassLoader`, `InMemoryClassLoader`, etc.) have the same behavior (except `DelegateLastClassLoader`) but they handle specificities for the input format.
-Each class loader has a delegate class loader, represented in the right part of  @fig:cl-class_loading_classes by black plain arrows for an instance of `PathClassLoader` and an instance of `DelegateLastClassLoader` (the other class loaders also have this delegate). 
+Each class loader has a delegate class loader, represented in the right part of @fig:cl-class_loading_classes by black plain arrows for an instance of `PathClassLoader` and an instance of `DelegateLastClassLoader` (the other class loaders also have this delegate). 
 This delegate is a concept specific to class loaders and has nothing to do with class inheritance. 
 By default, class loaders will delegate to the singleton class `BootClassLoader`, except if a specific class loader is provided when instantiating the new class loader.
 When a class loader needs to load a class, except for `DelegateLastClassLoader`, it will first ask the delegate, i.e. `BootClassLoader`, and if the delegate does not find the class, the class loader will try to load the class on its own. 
@ -102,7 +102,7 @@ With such a hypothesis, the delegation process can be modeled by the pseudo-code
 In addition, it is important to distinguish the two types of #platc handled by  `BootClassLoader` and that both have priority over classes from the application at runtime:

 - the ones available in the *#Asdk* (normally visible in the documentation);
- the ones that are internal and that should not be used by the developer. We call them *#hidec*@he_systematic_2023 @li_accessing_2016 (not documented).
+- the ones that are internal and that should not be used by the developer. We call them *#hidec*~@he_systematic_2023 @li_accessing_2016 (not documented).

 As a preliminary conclusion, we observe that a priority exists in the class loading mechanism and that an attacker could use it to prioritize an implementation over another one. 
 This could mislead the reverser if they use the one that has the lowest priority. 
@ -124,8 +124,8 @@ We discuss in the next section how to obtain these classes from the emulator.
 In the development environment, Android Studio uses `android.jar` and the specific classes written by the developer. 
 After compilation, only the classes of the developer, and sometimes extra classes computed by Android Studio are zipped in the APK file, using the multi-dex format. 
 At runtime, the application uses `BootClassLoader` to load the #platc from Android. 
-Until our work, previous works@he_systematic_2023 @li_accessing_2016 considered both #Asdk and #hidec  to be in the file `/system/framework/framework.jar` found in the phone itself, but we found that the classes loaded by `bootClassLoader` are not all present in `framework.jar`. 
-For example, He #etal @he_systematic_2023 counted 495 thousand APIs (fields and methods) in Android 12, based on Google documentation on restriction for non SDK interfaces#footnote[https://developer.android.com/guide/app-compatibility/restrictions-non-sdk-interfaces]. 
+Until our work, previous works~@he_systematic_2023 @li_accessing_2016 considered both #Asdk and #hidec  to be in the file `/system/framework/framework.jar` found in the phone itself, but we found that the classes loaded by `bootClassLoader` are not all present in `framework.jar`. 
+For example, He #etal~@he_systematic_2023 counted 495 thousand APIs (fields and methods) in Android 12, based on Google documentation on restriction for non SDK interfaces#footnote[https://developer.android.com/guide/app-compatibility/restrictions-non-sdk-interfaces]. 
 However, when looking at the content of `framework.jar`, we only found #num(333) thousand APIs.
 Indeed, classes such as `com.android.okhttp.OkHttpClient` are loaded by `bootClassLoader`, listed by Google, but not in  `framework.jar`.

--- a/4_class_loader/3_obfuscation.typ
+++ b/4_class_loader/3_obfuscation.typ
@ -8,7 +8,7 @@ Then, in order to evaluate their efficiency, we reviewed some common Android rev
 We call this collision "*class shadowing*", because the attacker version of the class shadows the one that will be used at runtime. 
 To evaluate if such shadow attacks are working, we handcrafted  three applications implementing shadowing techniques to test their impact on static analysis tools. 
 Then, we manually inspected the output of the tools in order to check its consistency with what Android is really doing at runtime. 
-For example, for Apktool, we look at the output disassembled code, and for Flowdroid@Arzt2014a, we check that a flow between `Taint.source()` and `Taint.sink()` is correctly computed.
+For example, for Apktool, we look at the output disassembled code, and for Flowdroid~@Arzt2014a, we check that a flow between `Taint.source()` and `Taint.sink()` is correctly computed.


 /*
@ -78,7 +78,7 @@ Such shadow attacks are more difficult to detect by a reverser, that may not kno
 )<lst:cl-testapp>


-We selected tools that are commonly used to unpack and reverse Android applications: Jadx#footnote[https://github.com/skylot/jadx], a decompiler for Android applications, Apktool#footnote[https://apktool.org/], a disassembler/repackager of applications, Androguard#footnote[https://github.com/androguard/androguard], one of the oldest Python package for manipulating Android applications, and Flowdroid@Arzt2014a that performs taint flow analysis. 
+We selected tools that are commonly used to unpack and reverse Android applications: Jadx#footnote[https://github.com/skylot/jadx], a decompiler for Android applications, Apktool#footnote[https://apktool.org/], a disassembler/repackager of applications, Androguard#footnote[https://github.com/androguard/androguard], one of the oldest Python package for manipulating Android applications, and Flowdroid~@Arzt2014a that performs taint flow analysis. 

 For evaluating the tools, we designed a single application that we can customize for different tests.
@lst:cl-testapp shows the main body implementing:
@ -204,11 +204,11 @@ Because of that, like for Apktool and Jadx, Androguard has no way to warn the re

 ==== Flowdroid

-Flowdroid@Arzt2014a is used to detect if an application can leak sensitive information. 
+Flowdroid~@Arzt2014a is used to detect if an application can leak sensitive information. 
 To do so, the analyst provides a list of source and sink methods. 
 The return value of a method marked as source is considered sensitive and the argument of a method marked as sink is considered to be leaked. 
 By analyzing the bytecode of an application, Flowdroid can detect if data emitted by source methods can be exfiltrated by a sink method. 
-Flowdroid is built on top of the Soot@Arzt2013 framework that handles, among other things, the class selection process. 
+Flowdroid is built on top of the Soot~@Arzt2013 framework that handles, among other things, the class selection process. 

 We found that when selecting the classes implementation in a multi-dex APK, Soot uses an algorithm close to what ART is performing: 
 Soot sorts the `.dex` bytecode file with a specified `prioritizer` (a comparison function that defines an order for #dexfiles) and selects the first implementation found when iterating over the sorted files. 
@ -257,7 +257,7 @@ In this section, we compare shadow attacks with these techniques and we discuss

 Advanced obfuscation techniques relying on packers have a higher impact on the difficulty of performing a static analysis compared to shadow attacks.
 Most of the time, the reverse engineer cannot deobfuscate the application without performing a dynamic analysis.
-For this reasons, approaches have been designed to assist the capture of the bytecode that is loaded dynamically, after the precise time where the deobfuscation methods have been executed@zhang2015dexhunter @xue2017adaptive @wong2018tackling.
+For this reasons, approaches have been designed to assist the capture of the bytecode that is loaded dynamically, after the precise time where the deobfuscation methods have been executed~@zhang2015dexhunter @xue2017adaptive @wong2018tackling.
 On the contrary, a shadow attack can be easily defeated by implementing our algorithm in the static analysis tool, as discussed earlier in @sec:cl-countermeasures.
 Nevertheless, shadow attacks are stealthier than packers or native code. 
 Packers can be easily spotted by artifacts left behind in the application or by detecting classes implementing a custom class loading mechanism.
@ -271,7 +271,7 @@ For example, by colliding a class of the SDK, a control flow analysis could be w
 At runtime, this code would be triggered, unpacking new code.

 Second, the attacker could use a packer to unpack code at runtime in a first phase.
-The reverse engineer would have to perform a dynamic analysis, for example uising a tool such as Dexhunter@zhang2015dexhunter, to recover new DEX files that are loaded by a custom class loader.
+The reverse engineer would have to perform a dynamic analysis, for example uising a tool such as Dexhunter~@zhang2015dexhunter, to recover new DEX files that are loaded by a custom class loader.
 Then, the reverse engineer would go back to a new static analysis and could have the problem of solving shadow attacks, for example, if a class is defined multiple times in the loaded DEX files.

 Because the interaction between shadow attacks and other obfuscations techniques often rely on a loading mechanism implemented by the developer, investigating these cases require to analyze the Java bytecode that is handling the loading. 
--- a/4_class_loader/4_in_the_wild.typ
+++ b/4_class_loader/4_in_the_wild.typ
@ -5,7 +5,7 @@

 In this section, we evaluate in the wild if applications that can be found in the Play store or other markets use one of the shadow techniques. 
 Our goal is to explore the usage of shadow techniques in real applications. 
-Because we want to include malicious applications (in case such techniques would be used to hide malicious code), we selected #num(50000) applications randomly from AndroZoo@allixAndroZooCollectingMillions2016 that appeared in 2023. 
+Because we want to include malicious applications (in case such techniques would be used to hide malicious code), we selected #num(50000) applications randomly from AndroZoo~@allixAndroZooCollectingMillions2016 that appeared in 2023. 
 Malicious applications are spot in our dataset by using a threshold of 3 over the number of antivirus reporting an application as a malware. 
 Some few applications over the total cannot be retrieved or parsed leading to a final dataset of #nbapk applications. 
 We automatically disassembled the applications to obtain the list of included classes. 
--- a/bibliography.bib
+++ b/bibliography.bib
@ -979,3 +979,182 @@ month = aug
 	file = {IEEE Xplore Abstract Record:/home/histausse/Zotero/storage/RFUDH972/8023141.html:text/html;Qu et al. - 2017 - DyDroid Measuring Dynamic Code Loading and Its Se.pdf:/home/histausse/Zotero/storage/27Z9P5T4/Qu et al. - 2017 - DyDroid Measuring Dynamic Code Loading and Its Se.pdf:application/pdf},
 }

+
+@article{bernardi_dynamic_2019,
+	title = {Dynamic malware detection and phylogeny analysis using process mining},
+	volume = {18},
+	issn = {1615-5270},
+	url = {https://doi.org/10.1007/s10207-018-0415-3},
+	doi = {10.1007/s10207-018-0415-3},
+	abstract = {In the last years, mobile phones have become essential communication and productivity tools used daily to access business services and exchange sensitive data. Consequently, they also have become one of the biggest targets of malware attacks. New malware is created everyday, most of which is generated as variants of existing malware by reusing its malicious code. This paper proposes an approach for malware detection and phylogeny studying based on dynamic analysis using process mining. The approach exploits process mining techniques to identify relationships and recurring execution patterns in the system call traces gathered from a mobile application in order to characterize its behavior. The recovered characterization is expressed in terms of a set of declarative constraints between system calls and represents a sort of run-time fingerprint of the application. The comparison between the so defined fingerprint of a given application with those of known malware is used to verify: (1) if the application is malware or trusted, (2) in case of malware, which family it belongs to, and (3) how it differs from other known variants of the same malware family. An empirical study conducted on a dataset of 1200 trusted and malicious applications across ten malware families has shown that the approach exhibits a very good discrimination ability that can be exploited for malware detection and malware evolution studying. Moreover, the study has also shown that the approach is robust to code obfuscation techniques increasingly being used by nowadays malware.},
+	language = {en},
+	number = {3},
+	urldate = {2025-07-28},
+	journal = {International Journal of Information Security},
+	author = {Bernardi, Mario Luca and Cimitile, Marta and Distante, Damiano and Martinelli, Fabio and Mercaldo, Francesco},
+	month = jun,
+	year = {2019},
+	keywords = {Biometrics, Computational Anthropology, Data Mining, Declare, Lineage tracking, Linear temporal logic, Malware detection, Malware evolution, Malware phylogeny, Paleogenetics, Process mining, Security, Sequence Annotation},
+	pages = {257--284},
+}
+
+@inproceedings{Andriatsimandefitra2012,
+	address = {Ottawa, Canada},
+	title = {Designing information flow policies for {Android}'s operating system},
+	isbn = {978-1-4577-2053-6},
+	url = {http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6364161},
+	doi = {10.1109/ICC.2012.6364161},
+	booktitle = {{IEEE} {International} {Conference} on {Communications}},
+	publisher = {IEEE Computer Society},
+	author = {Andriatsimandefitra, Radoniaina and Geller, Stéphane and Viet Triem Tong, Valérie},
+	month = jun,
+	year = {2012},
+	keywords = {★},
+	pages = {976--981},
+	file = {PDF:/home/histausse/Zotero/storage/5AD2IJP6/Andriatsimandefitra, Geller, Viet Triem Tong - 2012 - Designing information flow policies for Android's operating system.pdf:application/pdf},
+}
+
+@inproceedings{andriatsimandefitra_detection_2015,
+	title = {Detection and {Identification} of {Android} {Malware} {Based} on {Information} {Flow} {Monitoring}},
+	url = {https://ieeexplore.ieee.org/abstract/document/7371481},
+	doi = {10.1109/CSCloud.2015.27},
+	abstract = {Information flow monitoring has been mostly used to detect privacy leaks. In a previous work, we showed that they can also be used to characterize Android malware behaviours and in the current one we show that these flows can also be used to detect and identify Android malware. The characterization consists in computing automatically System Flow Graphs that describe how a malware disseminates its data in the system. In the current work, we propose a method that uses these SFG-based malware profile to detect the execution of Android malware by monitoring the information flows they cause in the system. We evaluated our method by monitoring the execution of 39 malware samples and 70 non malicious applications. Our results show that our approach detected the execution of all the malware samples and did not raise any false alerts for the 70 non malicious applications.},
+	urldate = {2025-07-28},
+	booktitle = {2015 {IEEE} 2nd {International} {Conference} on {Cyber} {Security} and {Cloud} {Computing}},
+	author = {Andriatsimandefitra, Radoniaina and Tong, Valérie Viet Triem},
+	month = nov,
+	year = {2015},
+	keywords = {android security, Androids, Containers, Humanoid robots, information flow, Java, Kernel, Malware, malware classification, malware detection, Monitoring},
+	pages = {200--203},
+	file = {Snapshot:/home/histausse/Zotero/storage/7FLAJ437/7371481.html:text/html;Submitted Version:/home/histausse/Zotero/storage/JR2N8XXZ/Andriatsimandefitra and Tong - 2015 - Detection and Identification of Android Malware Based on Information Flow Monitoring.pdf:application/pdf},
+}
+
+
+@inproceedings{yang_appspear_2015,
+	address = {Cham},
+	series = {Lecture {Notes} in {Computer} {Science}},
+	title = {{AppSpear}: {Bytecode} {Decrypting} and {DEX} {Reassembling} for {Packed} {Android} {Malware}},
+	isbn = {978-3-319-26362-5},
+	shorttitle = {{AppSpear}},
+	doi = {10.1007/978-3-319-26362-5_17},
+	abstract = {As the techniques for Android malware detection are progressing, malware also fights back through deploying advanced code encryption with the help of Android packers. An effective Android malware detection therefore must take the unpacking issue into consideration to prove the accuracy. Unfortunately, this issue is not easily addressed. Android packers often adopt multiple complex anti-analysis defenses and are evolving frequently. Current unpacking approaches are either based on manual efforts, which are slow and tedious, or based on coarse-grained memory dumping, which are susceptible to a variety of anti-monitoring defenses.},
+	language = {en},
+	booktitle = {Research in {Attacks}, {Intrusions}, and {Defenses}},
+	publisher = {Springer International Publishing},
+	author = {Yang, Wenbo and Zhang, Yuanyuan and Li, Juanru and Shu, Junliang and Li, Bodong and Hu, Wenjun and Gu, Dawu},
+	editor = {Bos, Herbert and Monrose, Fabian and Blanc, Gregory},
+	year = {2015},
+	keywords = {Android malware, Code protection, DEX reassembling},
+	pages = {359--381},
+	file = {Yang et al. - 2015 - AppSpear Bytecode Decrypting and DEX Reassembling.pdf:/home/histausse/Zotero/storage/HR2UALQW/Yang et al. - 2015 - AppSpear Bytecode Decrypting and DEX Reassembling.pdf:application/pdf},
+}
+
+@article{cui_droidhook_2023,
+	title = {{DroidHook}: a novel {API}-hook based {Android} malware dynamic analysis sandbox},
+	volume = {30},
+	issn = {1573-7535},
+	shorttitle = {{DroidHook}},
+	url = {https://doi.org/10.1007/s10515-023-00378-w},
+	doi = {10.1007/s10515-023-00378-w},
+	abstract = {With the popularity of Android devices, mobile apps are prevalent in our daily life, making them a target for attackers to steal private data and push advertisements. Dynamic analysis is an effective approach to detect runtime behavior of Android malware and can reduce the impact of code obfuscation. However, some dynamic sandboxes commonly used by researchers are usually based on emulators with older versions of Android, for example, the state-of-the-art sandbox, DroidBox. These sandboxes are vulnerable to evasion attacks and may not work with the latest apps. In this paper, we propose a prototype framework, DroidHook, as a novel automated sandbox for Android malware dynamic analysis. Unlike most existing tools, DroidHook has two obvious advantages. Firstly, the set of APIs to be monitored by DroidHook can be easily modified, so that DroidHook is ideally suitable for diverse situations, including the detection of a specific family of malware and unknown malware. Secondly, DroidHook does not depend on a specific Android OS but only on Xposed, so it can work with multiple Android versions and can perform normally on both emulators and real devices. Experiments show that DroidHook can provide more fine-grained and precise results than DroidBox. Moreover, with the support for real devices and new versions of Android, DroidHook can run most samples properly and acquire stronger detection results, compared to emulator-based tools.},
+	language = {en},
+	number = {1},
+	urldate = {2023-03-17},
+	journal = {Automated Software Engineering},
+	author = {Cui, Yuning and Sun, Yi and Lin, Zhaowen},
+	month = feb,
+	year = {2023},
+	keywords = {Android, Dynamic analysis, Mobile malware, Sandbox},
+	pages = {10},
+	file = {Cui et al. - 2023 - DroidHook a novel API-hook based Android malware .pdf:/home/histausse/Zotero/storage/I3BLZDLC/Cui et al. - 2023 - DroidHook a novel API-hook based Android malware .pdf:application/pdf},
+}
+
+@article{faghihi_camodroid_2022,
+	title = {{CamoDroid}: {An} {Android} application analysis environment resilient against sandbox evasion},
+	volume = {125},
+	issn = {1383-7621},
+	shorttitle = {{CamoDroid}},
+	url = {https://www.sciencedirect.com/science/article/pii/S1383762122000467},
+	doi = {10.1016/j.sysarc.2022.102452},
+	abstract = {In the past few years, numerous attempts have been made to mitigate evasive Android malware. However, it remains one of the challenges in smartphone security. Evasive malware can dodge dynamic analysis by detecting execution in sandboxes and hiding its malicious behaviors during the investigation. In this work, we present CamoDroid, an open-source and extendable dynamic analysis environment resilient against detection by state-of-the-art evasive Android malware. Our technique mimics data, sensors, user input, static and network features of actual devices and cloaks the existence of the analysis environment. It further improves dynamic analysis and provides a broad view of an application’s behavior by monitoring and logging the dangerous Application Programming Interface (API) calls executed by applications. We implement CamoDroid and assess its resiliency to sandbox detection. We first demonstrate that our sandbox cannot be detected using modern existing academic and commercial applications that can distinguish analysis environments from real devices. We also assess the dependability of CamoDroid against real-world evasive malware and show that it can successfully cloak the existence of the analysis environment to more than 96 percent of evasive Android malware. Moreover, we investigate other popular Android sandboxes and show that they are vulnerable to at least one type of sandbox detection heuristic.},
+	urldate = {2025-07-28},
+	journal = {Journal of Systems Architecture},
+	author = {Faghihi, Farnood and Zulkernine, Mohammad and Ding, Steven},
+	month = apr,
+	year = {2022},
+	keywords = {Android, Dynamic analysis, Malware detection},
+	pages = {102452},
+	file = {ScienceDirect Snapshot:/home/histausse/Zotero/storage/36WARYCE/S1383762122000467.html:text/html},
+}
+
+@article{sutter_dynamic_2024,
+	title = {Dynamic {Security} {Analysis} on {Android}: {A} {Systematic} {Literature} {Review}},
+	volume = {12},
+	issn = {2169-3536},
+	shorttitle = {Dynamic {Security} {Analysis} on {Android}},
+	url = {https://ieeexplore.ieee.org/abstract/document/10504267},
+	doi = {10.1109/ACCESS.2024.3390612},
+	abstract = {Dynamic analysis is a technique that is used to fully understand the internals of a system at runtime. On Android, dynamic security analysis involves real-time assessment and active adaptation of an app’s behaviour, and is used for various tasks, including network monitoring, system-call tracing, and taint analysis. The research on dynamic analysis has made significant progress in the past years. However, to the best of our knowledge, there is a lack in secondary studies that analyse the novel ideas and common limitations of current security research. The main aim of this work is to understand dynamic security analysis research on Android to present the current state of knowledge, highlight research gaps, and provide insights into the existing body of work in a structured and systematic manner. We conduct a systematic literature review (SLR) on dynamic security analysis for Android. The systematic review establishes a taxonomy, defines a classification scheme, and explores the impact of advanced Android app testing tools on security solutions in software engineering and security research. The study’s key findings centre on tool usage, research objectives, constraints, and trends. Instrumentation and network monitoring tools play a crucial role, with research goals focused on app security, privacy, malware detection, and software testing automation. Identified limitations include code coverage constraints, security-related analysis obstacles, app selection adequacy, and non-deterministic behaviour. Our study results deepen the understanding of dynamic analysis in Android security research by an in-depth review of 43 publications. The study highlights recurring limitations with automated testing tools and concerns about detecting or obstructing dynamic analysis.},
+	urldate = {2025-07-28},
+	journal = {IEEE Access},
+	author = {Sutter, Thomas and Kehrer, Timo and Rennhard, Marc and Tellenbach, Bernhard and Klein, Jacques},
+	year = {2024},
+	keywords = {Android, Androids, Codes, dynamic analysis, fuzzing, Fuzzing, instrumentation, Instrumentation and measurement, machine learning, Machine learning, monitoring, Monitoring, Operating systems, security, Security, software testing, Software testing, Systematics, Taxonomy, tracing, vulnerabilities},
+	pages = {57261--57287},
+	file = {Full Text PDF:/home/histausse/Zotero/storage/RGVZFQY8/Sutter et al. - 2024 - Dynamic Security Analysis on Android A Systematic Literature Review.pdf:application/pdf},
+}
+
+
+@inproceedings{mao_sapienz_2016,
+	address = {New York, NY, USA},
+	series = {{ISSTA} 2016},
+	title = {Sapienz: multi-objective automated testing for {Android} applications},
+	isbn = {978-1-4503-4390-9},
+	shorttitle = {Sapienz},
+	url = {https://doi.org/10.1145/2931037.2931054},
+	doi = {10.1145/2931037.2931054},
+	abstract = {We introduce Sapienz, an approach to Android testing that uses multi-objective search-based testing to automatically explore and optimise test sequences, minimising length, while simultaneously maximising coverage and fault revelation. Sapienz combines random fuzzing, systematic and search-based exploration, exploiting seeding and multi-level instrumentation. Sapienz significantly outperforms (with large effect size) both the state-of-the-art technique Dynodroid and the widely-used tool, Android Monkey, in 7/10 experiments for coverage, 7/10 for fault detection and 10/10 for fault-revealing sequence length. When applied to the top 1,000 Google Play apps, Sapienz found 558 unique, previously unknown crashes. So far we have managed to make contact with the developers of 27 crashing apps. Of these, 14 have confirmed that the crashes are caused by real faults. Of those 14, six already have developer-confirmed fixes.},
+	urldate = {2025-07-29},
+	booktitle = {Proceedings of the 25th {International} {Symposium} on {Software} {Testing} and {Analysis}},
+	publisher = {Association for Computing Machinery},
+	author = {Mao, Ke and Harman, Mark and Jia, Yue},
+	month = jul,
+	year = {2016},
+	pages = {94--105},
+	file = {Submitted Version:/home/histausse/Zotero/storage/BXPWWPAU/Mao et al. - 2016 - Sapienz multi-objective automated testing for Android applications.pdf:application/pdf},
+}
+
+@inproceedings{su_guided_2017,
+	address = {New York, NY, USA},
+	series = {{ESEC}/{FSE} 2017},
+	title = {Guided, stochastic model-based {GUI} testing of {Android} apps},
+	isbn = {978-1-4503-5105-8},
+	url = {https://doi.org/10.1145/3106237.3106298},
+	doi = {10.1145/3106237.3106298},
+	abstract = {Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness.  Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17{\textasciitilde}31\% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed.},
+	urldate = {2025-07-29},
+	booktitle = {Proceedings of the 2017 11th {Joint} {Meeting} on {Foundations} of {Software} {Engineering}},
+	publisher = {Association for Computing Machinery},
+	author = {Su, Ting and Meng, Guozhu and Chen, Yuting and Wu, Ke and Yang, Weiming and Yao, Yao and Pu, Geguang and Liu, Yang and Su, Zhendong},
+	month = aug,
+	year = {2017},
+	pages = {245--256},
+}
+
+
+@inproceedings{abraham_grodddroid_2015,
+	title = {{GroddDroid}: a gorilla for triggering malicious behaviors},
+	shorttitle = {{GroddDroid}},
+	url = {https://ieeexplore.ieee.org/abstract/document/7413692},
+	doi = {10.1109/MALWARE.2015.7413692},
+	abstract = {Android malware authors use sophisticated techniques to hide the malicious intent of their applications. They use cryptography or obfuscation techniques to avoid detection during static analysis. They can also avoid detection during a dynamic analysis. Frequently, the malicious execution is postponed as long as the malware is not convinced that it is running in a real smartphone of a real user. However, we believe that dynamic analysis methods give good results when they really monitor the malware execution. In this article1, we propose a method to enhance the execution of the malicious code of unknown malware. We especially target malware that have triggering protections, for example branching conditions that wait for an event or expect a specific value for a variable before triggering malicious execution. In these cases, solely executing the malware is far from being sufficient. We propose to force the triggering of the malicious code by combining two contributions. First, we define an algorithm that automatically identifies potentially malicious code. Second, we propose an enhanced monkey called GroddDroid, that stimulates the GUI of an application and forces the execution of some branching conditions if needed. The forcing is used by GroddDroid to push the execution flow towards the previously identified malicious parts of the malware and execute it. The source code for our experiments with GroddDroid is released as free software2. We have verified on a malware dataset that we investigated manually that the malicious code is accurately executed by GroddDroid. Additionally, on a large dataset of 100 malware we precisely identify the nature of the suspicious code and we succeed to execute it at 28\%.},
+	urldate = {2025-07-29},
+	booktitle = {2015 10th {International} {Conference} on {Malicious} and {Unwanted} {Software} ({MALWARE})},
+	author = {Abraham, A. and Andriatsimandefitra, R. and Brunelat, A. and Lalande, J.-F. and Viet Triem Tong, V.},
+	month = oct,
+	year = {2015},
+	keywords = {Androids, Force, Graphical user interfaces, Humanoid robots, Java, Malware, Monitoring},
+	pages = {119--127},
+	file = {Snapshot:/home/histausse/Zotero/storage/E4949JUV/7413692.html:text/html;Submitted Version:/home/histausse/Zotero/storage/CPJLKBNJ/Abraham et al. - 2015 - GroddDroid a gorilla for triggering malicious behaviors.pdf:application/pdf},
+}
+