140 lines
12 KiB
Typst
140 lines
12 KiB
Typst
#import "../lib.typ": epigraph, eg, APK, API, highlight-block, pb1-text, pb2-text, pb3-text, ie
|
|
#import "../lib.typ": todo, jfl-note, jm-note
|
|
|
|
= Introduction <sec:intro>
|
|
|
|
// https://youtu.be/si9iqF5uTFk?t=1512
|
|
#epigraph("Rear Admiral Grace Hopper")[If during the next 12 months any one of you says "but we have always done it that way", I will instantly materialize beside you and I will haunt you for 24 hours.]
|
|
|
|
|
|
// De tout temps les hommes on fait des apps android ...
|
|
Android is the most used mobile operating system since 2014, and since 2017, it even surpasses Windows all platforms combined#footnote[https://gs.statcounter.com/os-market-share#monthly-200901-202304].
|
|
The public adoption of Android is confirmed by application developers, with 1.3 million apps available in the Google Play Store in 2014, and 3.5 million apps available in 2017#footnote[https://www.statista.com/statistics/266210].
|
|
Its popularity makes Android a prime target for malware developers.
|
|
Indeed, various applications have been shown to behave maliciously, from stealing personal informations~@shanSelfhidingBehaviorAndroid2018 to hijacking the smartphone's computing resources to mine cryptocurrency~@adjibi_devil_2022.
|
|
|
|
Considering the importance of Android in the everyday life of so many people, Google, the company that develops Android, defined a very strong security model that addresses an extensive threat model~@mayrhofer_android_2021.
|
|
This threat model goes as far as to consider that an adversary can have physical access to an unlocked device (#eg an abusive partner, or a border control). // Americaaaaa
|
|
On the device, this security model includes the sandboxing of each application, controlled using a system of permissions to allow the applications to perform potentially unwanted actions.
|
|
For example, an application cannot access the contact list without requesting permission from the user first.
|
|
Android keeps improving its security from version to version by improving the sandboxing (#eg starting with Android 10, applications can no longer access the clipboard if they are not focused) or by using safer defaults (#eg since Android 9, by default, all network connections must use TLS).
|
|
// Android Bouncer, ca marche pas tres bien quand même ect ect (stralker ware?)
|
|
|
|
In the spirit of _defence in depth_, Google developed a _Bouncer_ service that scans applications in the store for malicious software#footnote[https://googlemobile.blogspot.com/2012/02/android-and-security.html].
|
|
Although its #jm-note[operation][I would have said "operating" but grammarly disagrees] is kept secret, it seems that the Bouncer is both comparing the applications with known malware code and running the applications in Google's cloud infrastructure to detect hidden behavior.
|
|
Despite Google's efforts, malicious applications are still found in the Play Store~@adjibi_devil_2022.
|
|
Also, it is not uncommon for people in abusive situations #jfl-note[to have their abuser install][jfl says "install#strong[ing]", jm says no, grammarly is on the side of jm] on their phone a stalkerware (spying application) found outside of the Play Store~@stateofstalkerware.
|
|
|
|
For these reasons, it is important to be able to analyse an application and understand what it does.
|
|
This process is called reverse engineering.
|
|
A lot of work has been done to reverse engineer computer software, but Android applications come with specific challenges that need to be addressed.
|
|
For instance, Android applications are distributed in a specific file format, the #APK format, and the code of the application is mainly compiled into an Android-specific bytecode: Dalvik.
|
|
An Android reverse engineer will need tools that can read those Android-specific formats.
|
|
A first test in the process of reverse engineering an application would be to simply read the content of the application and the code in it.
|
|
Tools like Apktool can be used to convert the binary files of an application into a human-readable format.
|
|
Other tools like Jadx can go further and try to generate Java code from the bytecode in the application.
|
|
Because Android applications tend to be quite large, it can be quite tedious to understand what it does just from reading its bytecode.
|
|
To address this issue, many tools/approaches have been developed~@Li2017 @sutter_dynamic_2024 to extract higher-level information about the behavior of the application without having to manually analyse the application.
|
|
For example, Flowdroid~@Arzt2014a aims to detect information leaks: given a set of methods that can generate private information, and a set of methods that send information to the outside, Flowdroid will detect if private information is sent to the outside.
|
|
Once again, those kinds of tools need to target Android specifically.
|
|
Android runs its applications code differently than a computer would run software.
|
|
One example would be the handling of entry points: computer software usually has one entry point, whereas Android applications have many, and Android will choose depending on context.
|
|
Unfortunately, those tools are hard to use, and even when they work on small example applications, it is not uncommon for them to fail to run on real-life applications~@reaves_droid_2016.
|
|
This is worrying.
|
|
Android applications are becoming more complex every year, and tools that cannot handle this complexity will fail more often.
|
|
This leads us to our first problem statement:
|
|
// Chiffrer les contrib avec des xp qui ignore les app qui font crasher les outils?
|
|
|
|
#highlight-block(breakable: false)[
|
|
*Pb1*: #pb1-text
|
|
|
|
Many tools have been published to analyse Android applications, but the Android ecosystem is evolving rapidly.
|
|
Tools developed 5 years ago might not be usable anymore.
|
|
We will endeavor to identify which tools are still usable today, and for the others, what causes them to no longer be an option.
|
|
] <pb-1>
|
|
|
|
Another issue is that Android application developers sometimes use various techniques to slow down reverse engineering.
|
|
This process is called obfuscation.
|
|
Malware developers do that to hide malicious behavior and avoid detection, but the use of obfuscation is not proof that an application is malicious.
|
|
Indeed, legitimate application developers can also use obfuscation to protect their intellectual property. // burrkkk
|
|
Thus, developers and reverse engineers are playing a game of cat and mouse, constantly inventing new techniques to hide or reveal the behavior of an application.
|
|
|
|
There are two types of reverse engineering techniques: static and dynamic.
|
|
Static analysis consists #jfl-note[of][jfl asks "in"?\ grammarly says "of"] examining the application without running it, while dynamic analysis studies the action of the application while it is running.
|
|
Both methods have their drawbacks, and techniques will often capitalyse on the drawbacks of one of those methods.
|
|
For instance, an application can try to detect if it is running in a sandbox environment and not act maliciously if it is the case.
|
|
Similarly, an application can dynamically load bytecode at runtime, and this bytecode will not be available during a static analysis.
|
|
Dynamic code loading relies on Java classes called `ClassLoader` that are central components of the Android runtime environment.
|
|
Because dynamic code loading is such a difficult problem for static analysis, dynamic class loading is often ignored when doing static analysis.
|
|
However, class loading is not limited to dynamic code loading.
|
|
As a matter of fact, the Android Runtime is constantly performing class loading to load classes from the application or from the Android platform itself.
|
|
This blind spot in static analysis tools raises our second problem statement:
|
|
|
|
#highlight-block(breakable: false)[
|
|
*Pb2*: #pb2-text
|
|
|
|
Class loading is an operation often ignored by static analysis tools.
|
|
The exact algorithm used is not well known and might not be accurately modeled by static analysis tools.
|
|
If it is the case, discrepancies between the model of the tools and the one used by Android could be used as a base for new obfuscation techniques.
|
|
] <pb-2>
|
|
|
|
#jfl-note[
|
|
Reflection is another common obfuscation technique against static analysis.
|
|
Instead of directly invoking methods, the generic `Method.invoke()` #API is used, and the method is retrieved from its name in the form of a character string.
|
|
Finding the value of this string can be quite difficult to determine statically, so it is once again an issue more suitable for dynamic analysis.
|
|
When encountering a complex case of reflection (#ie using ciphered strings) or code loading, a reverse engineer will switch to dynamic analysis to collect the relevant data (the name of the methods called or the code that was loaded), then switch back to static analysis.
|
|
This is doable for a manual analysis; unfortunately, the more automated tools that would require that runtime information to perform an accurate analysis may not have a way to access this new data.
|
|
This led us to our last problem statement:
|
|
][
|
|
|
|
Peu developpé.
|
|
Expliquer qu'un reverser, s'il trouve de la reflection ou du dyn load peut eventuellement capturer les données en analyse dynamique.
|
|
Mais ensuite ces données devienent inutiles s'il retourne a de l'analyse static.
|
|
En effet, il fait souvant les deux en alternances.
|
|
Il avait besoin que les data issues de l'analyse dyn soient prisent en compte par l'analyse statique, par example...
|
|
|
|
TODO: trouver un example simple a formuler
|
|
]
|
|
#highlight-block(breakable: false)[
|
|
*Pb3*: #pb3-text
|
|
|
|
Dynamic code loading and reflection are problems most suited for dynamic analysis.
|
|
However, static analysis tools do not have access to collected data.
|
|
Encoding this information inside valid applications could be a way to make it universally available to any static analysis tool.
|
|
Ideally, this encoding should not degrade the quality of the static analysis compared to the original application.
|
|
] <pb-3>
|
|
|
|
#[
|
|
#set heading(numbering: none, outlined: false, bookmarked: false)
|
|
|
|
== Contributions
|
|
|
|
The contributions of this thesis are the following:
|
|
|
|
+ We evaluate the reusability of Android static analysis tools published by the community:
|
|
we rebuild the tools in their original environment as container images.
|
|
With those containers, those tools are now readily available on any environment capable of running either Docker or Singularity.
|
|
We tested those tools on a dataset of real-life applications balanced in order to have a significant number of applications with different characteristics to assess which characteristics impact the success of a tool.
|
|
This work was presented at the ICSR 2024 conference~@rasta.
|
|
+ We model the default class loading behavior of Android.
|
|
Based on this model, we define a class of obfuscation techniques that we call _shadow attacks_ where a class definition in an #APK shadows the actual class definition.
|
|
We show that common state-of-the-art tools like Jadx or Flowdroid do not implement this model correctly and thus can fall for those shadow attacks.
|
|
We analysed a large number of recent Android applications and found that applications with class shadowing do exist, though they are the result of quirks in the #APK compilation process and not deliberate obfuscation attempts.
|
|
This work was published in the Digital Threats journal~@classloaderinthemiddle. #todo[update ref when not 'just published' anymore]
|
|
+ We propose an approach to allow static analysis tools to analyse applications that perform dynamic code loading:
|
|
We collect at runtime the bytecode dynamically loaded and the reflection calls information, and patch the #APK file to perform those operations statically.
|
|
Finally, we evaluate the impact this transformation has on the tools we containerized previously.
|
|
|
|
== Outline
|
|
|
|
This dissertation is composed of 6 chapters.
|
|
This introduction is the first chapter.
|
|
It is followed by @sec:bg which gives background information about Android and the different analysis techniques targeting Android applications.
|
|
|
|
The next 3 chapters are dedicated to the contributions of this thesis.
|
|
First @sec:rasta studies the reusability of static analysis tools.
|
|
Next in @sec:cl, we model the default class loading algorithm used by Android and show the consequences for reverse engineering tools that implement a wrong model.
|
|
Then @sec:th presents an approach that allows for static analysis tools to analyse applications that load bytecode at runtime.
|
|
|
|
Finally, @sec:conclusion summarizes the contributions of this thesis and opens perspectives for future work.
|
|
]
|