diff --git a/1_introduction/main.typ b/1_introduction/main.typ index d7396a4..d22dcb3 100644 --- a/1_introduction/main.typ +++ b/1_introduction/main.typ @@ -21,7 +21,7 @@ Android keeps improving its security from version to version by improving the sa // Android Bouncer, ca marche pas tres bien quand même ect ect (stralker ware?) In the spirit of _defence in depth_, Google developed a _Bouncer_ service that scans applications in the store for malicious software#footnote[https://googlemobile.blogspot.com/2012/02/android-and-security.html]. -Although its #jm-note[operation][I would have said "operating" but grammarly disagrees] is kept secret, it seems that the Bouncer is both comparing the applications with known malware code and running the applications in Google's cloud infrastructure to detect hidden behavior. +Although its #jm-note[operation][I would have said "operating" but grammarly disagrees] is kept secret, it seems that the Bouncer is both comparing the applications with known malware code and running the applications in Google's cloud infrastructure to detect hidden behaviour. Despite Google's efforts, malicious applications are still found in the Play Store~@adjibi_devil_2022. Also, it is not uncommon for people in abusive situations #jfl-note[to have their abuser install][jfl says "install#strong[ing]", jm says no, grammarly is on the side of jm] on their phone a stalkerware (spying application) found outside of the Play Store~@stateofstalkerware. @@ -33,8 +33,8 @@ An Android reverse engineer will need tools that can read those Android-specific A first test in the process of reverse engineering an application would be to simply read the content of the application and the code in it. Tools like Apktool can be used to convert the binary files of an application into a human-readable format. Other tools like Jadx can go further and try to generate Java code from the bytecode in the application. -Because Android applications tend to be quite large, it can be quite tedious to understand what it does just from reading its bytecode. -To address this issue, many tools/approaches have been developed~@Li2017 @sutter_dynamic_2024 to extract higher-level information about the behavior of the application without having to manually analyse the application. +Because Android applications tend to be quite large, it can be quite tedious to understand what it does just from reading their bytecode. +To address this issue, many tools/approaches have been developed~@Li2017 @sutter_dynamic_2024 to extract higher-level information about the behaviour of the application without having to manually analyse the application. For example, Flowdroid~@Arzt2014a aims to detect information leaks: given a set of methods that can generate private information, and a set of methods that send information to the outside, Flowdroid will detect if private information is sent to the outside. Once again, those kinds of tools need to target Android specifically. Android runs its applications code differently than a computer would run software. @@ -50,18 +50,18 @@ This leads us to our first problem statement: Many tools have been published to analyse Android applications, but the Android ecosystem is evolving rapidly. Tools developed 5 years ago might not be usable anymore. - We will endeavor to identify which tools are still usable today, and for the others, what causes them to no longer be an option. + We will endeavour to identify which tools are still usable today, and for the others, what causes them to no longer be an option. ] Another issue is that Android application developers sometimes use various techniques to slow down reverse engineering. This process is called obfuscation. -Malware developers do that to hide malicious behavior and avoid detection, but the use of obfuscation is not proof that an application is malicious. +Malware developers do that to hide malicious behaviour and avoid detection, but the use of obfuscation is no proof that an application is malicious. Indeed, legitimate application developers can also use obfuscation to protect their intellectual property. // burrkkk -Thus, developers and reverse engineers are playing a game of cat and mouse, constantly inventing new techniques to hide or reveal the behavior of an application. +Thus, developers and reverse engineers are playing a game of cat and mouse, constantly inventing new techniques to hide or reveal the behaviour of an application. There are two types of reverse engineering techniques: static and dynamic. Static analysis consists #jfl-note[of][jfl asks "in"?\ grammarly says "of"] examining the application without running it, while dynamic analysis studies the action of the application while it is running. -Both methods have their drawbacks, and techniques will often capitalyse on the drawbacks of one of those methods. +Both methods have their drawbacks, and techniques will often capitalise on the drawbacks of one of those methods. For instance, an application can try to detect if it is running in a sandbox environment and not act maliciously if it is the case. Similarly, an application can dynamically load bytecode at runtime, and this bytecode will not be available during a static analysis. Dynamic code loading relies on Java classes called `ClassLoader` that are central components of the Android runtime environment. @@ -74,7 +74,7 @@ This blind spot in static analysis tools raises our second problem statement: *Pb2*: #pb2-text Class loading is an operation often ignored by static analysis tools. - The exact algorithm used is not well known and might not be accurately modeled by static analysis tools. + The exact algorithm used is not well known and might not be accurately modelled by static analysis tools. If it is the case, discrepancies between the model of the tools and the one used by Android could be used as a base for new obfuscation techniques. ] @@ -94,7 +94,7 @@ This is doable for a manual analysis; unfortunately, the more complex tools that TODO: trouver un example simple a formuler ] -Some contribution made the results they computed available to other tools by modifying the application (intrumenting) in a way that reflect those results. +Some contributions made the results they computed available to other tools by modifying the application (instrumenting) in a way that reflects those results. This led us to our last problem statement: #highlight-block(breakable: false)[ *Pb3*: #pb3-text @@ -113,29 +113,31 @@ This led us to our last problem statement: The contributions of this thesis are the following: + We evaluate the reusability of Android static analysis tools published by the community: - we rebuild the tools in their original environment as container images. + we rebuild static analysis tools in their original environment as container images. With those containers, those tools are now readily available on any environment capable of running either Docker or Singularity. We tested those tools on a dataset of real-life applications balanced in order to have a significant number of applications with different characteristics to assess which characteristics impact the success of a tool. - This work was presented at the ICSR 2024 conference~@rasta. -+ We model the default class loading behavior of Android. + This work was presented at the 21#super[st] International Conference on Software and Systems Reuse (ICSR 2024) conference~@rasta. ++ We model the default class loading behaviour of Android. Based on this model, we define a class of obfuscation techniques that we call _shadow attacks_ where a class definition in an #APK shadows the actual class definition. We show that common state-of-the-art tools like Jadx or Flowdroid do not implement this model correctly and thus can fall for those shadow attacks. We analysed a large number of recent Android applications and found that applications with class shadowing do exist, though they are the result of quirks in the #APK compilation process and not deliberate obfuscation attempts. This work was published in the Digital Threats journal~@classloaderinthemiddle. #todo[update ref when not 'just published' anymore] + We propose an approach to allow static analysis tools to analyse applications that perform dynamic code loading: We collect at runtime the bytecode dynamically loaded and the reflection calls information, and patch the #APK file to perform those operations statically. - Finally, we evaluate the impact this transformation has on the tools we containerized previously. + Finally, we evaluate the impact this transformation has on the tools we containerised previously.#jfl-note[Dire 2 mots sur la méthode de patch qui a été reimplémentée pour être robuste? \ jm: j'ai pas eu le temps de comparer avec soot/droidRA, je trouve que sans xp ca fait trop trust me bro #emoji.cat.face.cry] + +#jfl-note[We release a buch of open source sofware to help the research community: rasta, androscalpel, theseus \ jm: rasta ok, androscalpel/theseus peut être mais j'attend tj le ok de l'inria] == Outline This dissertation is composed of 6 chapters. This introduction is the first chapter. -It is followed by @sec:bg which gives background information about Android and the different analysis techniques targeting Android applications. +It is followed by @sec:bg, which gives background information about Android and the various analysis techniques targeting Android applications. -The next 3 chapters are dedicated to the contributions of this thesis. +The next three chapters are dedicated to the contributions of this thesis. First @sec:rasta studies the reusability of static analysis tools. -Next in @sec:cl, we model the default class loading algorithm used by Android and show the consequences for reverse engineering tools that implement a wrong model. +Next, in @sec:cl, we model the default class loading algorithm used by Android and show the consequences for reverse engineering tools that implement the wrong model. Then @sec:th presents an approach that allows for static analysis tools to analyse applications that load bytecode at runtime. -Finally, @sec:conclusion summarizes the contributions of this thesis and opens perspectives for future work. +Finally, @sec:conclusion summarises the contributions of this thesis and opens perspectives for future work. ]