thesis/1_introduction/main.typ
Jean-Marie Mineau f5145237ce
All checks were successful
/ test_checkout (push) Successful in 1m9s
intro
2025-08-04 23:45:09 +02:00

128 lines
11 KiB
Typst

#import "../lib.typ": todo, epigraph, eg, APK, API, highlight, jm-note,
= Introduction <sec:intro>
// https://youtu.be/si9iqF5uTFk?t=1512
#epigraph("Rear Admiral Grace Hopper")[If during the next 12 months any one of you says "but we have always done it that way", I will instantly materialize beside you and I will haunt you for 24 hours.]
// De tout temps les hommes on fait des apps android ...
Android is the most used mobile operating system since 2014, and since 2017, it even surpasses Windows all platforms combined#footnote[https://gs.statcounter.com/os-market-share#monthly-200901-202304].
The public adoption of Android is confirmed by application developers, with 1.3 millions apps available in the Google Play Store in 2014, and 3.5 millions apps available in 2017#footnote[https://www.statista.com/statistics/266210].
Its popularity makes Android a prime target for malware developers.
Various applications have been shown to behave maliciously, from stealing personal informations~@shanSelfhidingBehaviorAndroid2018 to hijacking the phone computing ressources to mine cryptocurrency~@adjibi_devil_2022.
Considering the importance of Android in the everyday live of so many people, Google, the company that develops Android, defined a very strong security model that addresses an extensive threat model~@mayrhofer_android_2021.
This threat model goes as far as to consider that an adversarie can have physical access to an unlocked device (#eg an abusive partner, or a border control). // Americaaaaa
On the device, this security model imply the sandboxing of each applications, with a system of permissions to allow the applications to perform potentially unwanted actions.
For example, an applications cannot access the contact list without requesting the permission to the user first.
Android keep improving its security version from version, be it by improving the sandboxing (#eg starting with Android 10, application can no longer access the clipboard if they are not focused) or safer default (#eg since Android 9, by default, all network connection must use TLS).
// Android Bouncer, ca marche pas tres bien quand même ect ect (stralker ware?)
In the spirit of _defence in depth_, Google develloped a _Bouncer_ service that scan applications in the store for malicious software#footnote[https://googlemobile.blogspot.com/2012/02/android-and-security.html].
Although its operating is kept secret, it seems that the Bouncer is both comparing the applications with known malware code and running the applications in Google's cloud infrastructure to detect hidden behavior.
Despite Google's efforts, malicious applications are still found in the Play Store~@adjibi_devil_2022.
Also, it is not uncommmon for people in abusive situation to have their abuser install on their phone a stalkerware (spying application) found outside of the Play Store~@stateofstalkerware.
For this reasons, it is important to be able to analyse an application and understand was it does.
This process is called reverse engineering.
A lot of work has been done to reverse engineering computer software, but Android applications come with specific challenges that need to be address.
For instance, Android application have a distributed in a specific file format, the #APK format, and the code of the application is mainly compile into an Android specific bytecode: Dalvik.
An Android reverse engineer will need tools that can read those Android specific formats.
A first test in the process of reverse engineering an application would be to simply read the content of the application and the code in it.
Tools like apktool can be used to convert the binary files of an application in a human readable format.
Other tools like Jadx can go farther and try to generate Java code from the bytecode in the application.
Because Android applications tend to be massive, it can be quite tedious to understand what it doest juste from reading its bytecode.
To help, many tools/approches have been developed~@Li2017 @sutter_dynamic_2024.
For example, Flowdroid~@Arzt2014a aims to detect information leak: given a set of methods that can generate private information, and a set of methods that send information to the outside, Flowdroid will detect if private information is send to the outside.
Once again, those kind of tools need to target Android specifically.
Android run its applications code differently very than a computer would run software.
One example would be entry points: computer software usually have one starting point, when Android applications have many, that Android will chose depending on the situation.
Unfortunately, those tools are hard to use, and even when the work on small example application, it is not uncommon for them to fail to run on real-live applications~@reaves_droid_2016.
This is worrying.
Android applications are becoming more complexe every years and tools that cannot handle this complexity only fail more often.
This leads us to our first problem statement:
// Chiffrer les contrib avec des xp qui ignore les app qui font crasher les outils?
#highlight(breakable: false)[
*Pb 1*: _To what extent are previously published Android analysis tools still usable today, and what factors impact their reusability?_
Many tools have been published to analyse Android applications, but the Android ecosystem is fast evolving.
Tools developed 5 years ago might not be usable anymore.
We will endeavor to identify which tools are still usable today, and for the others, what causes them to no longer be an option.
]
Another issue is that Android application developpers sometime use various techniques to slow down reverse engineering.
This process called obfuscation.
Malware developpers do that to hide malicious behavior and avoid detection, but the use of obfuscation is not a proof that and application is malicious.
Indeed, legitimate applications developpers can also use obfuscation to protect their intellectual property. // burrkkk
Thus, developpers and reverse engineers are playing a game of cats and mouse, constantly inventing new technique to hide or reveal the behavior of an application.
They are two types of reverse engineering: static and dynamic.
Static analysis consists of examining the application without running it, while dynamic analysis studdy the action of the application durring its run.
Both methods have their drawbacks, and techniques will often capitalyse on the drawbacks of one of those methods.
For instance, an application can try to detect if it is running in a sandbox environment and not act maliciously if it is the case.
Similarly, an application can dynamicaly load bytecode at runtime, and this bytecode will not be available during a static analysis.
Dynamic code loading rely on Java classes called `ClassLoader` that are central components of the Android runtime environment.
Because dynamic code loading is such a difficult probleme for static analysis, dynamic class loading is often ignore when doing static analysis.
However, class loading is not limited to dynamic code loading.
In fact, the Android Runtime is constantly performing class loading to load classes from the application of from the Android platform itself.
This blind spot in static analysis tools raises our second problem statement:
#highlight(breakable: false)[
*Pb 2*: _What is the default Android class loading algorithm, and does it impact static analysis?_
Class loading is an operation often ignored in static analysis.
The exact algorithm used is not well known and might not be accurately modeled by static analysis tools.
If it is the case, discrepancies between the model of the tools and the one used by Android could be used as a base for new obfuscation techniques.
]
Reflection is another common obfuscation technique against static analysis.
Instead of directly invoking methods, the generic `Method.invoke()` #API is used, and the method is retrieved from its name in the form of a character string.
Finding the value of this string can be quite difficult to determine statically, so it is once again an issue more suitable for dynamic analysis.
A reverse engineer can obtain the relevant information with dynamic analysing, but there is no standard way to make static analysis tools aware of it.
This lead us to our last problem statement:
#highlight(breakable: false)[
*Pb 3*: _Can we provide dynamic code loading and reflection data collected dynamically to any static analysis tools to improve their results?_
Dynamic code loading and reflection are problems most suited for dynamic analysis.
However, static analysis tools do not have access to collected data.
Encoding this information inside valid applications could be a way to make it universally available to any static analysis tool.
#todo[say something about the impact that can have on tools?]
]
#[
#set heading(numbering: none, outlined: false, bookmarked: false)
== Contributions
The contributions of this thesis are the following:
+ We evaluate the reusability of Android static analysis tools published by the community:
We rebuild the tools in their original environment as container images.
With those containers, those tools are now readilly available capable of running either Docker of Singularity.
We also tested those tools on a dataset of real-life applications balanced in order to have a significant number of applications with different caracteristics to assess which caracteristic impact the success of a tools.
This work was presented at the ICSR 2024 conference~@rasta.
+ We model the default class loading behavior of Android.
Based on this model, we defined a class of obfuscation technique that we called _shadow attacks_ where an class definition in an #APK shadows the actual class definition.
We show that common state of the arts tools like Jadx or Flowdroid do not implement this model correctly and thus can fall for those shadow attacks.
We surveilled a large number of rescent Android applications and found that applications with classes shadowing the actual definition do exists, those are the result of quirks in the #APK compilation process and not deliberate obfuscation attempts.
This work was publish in the Digital Threats journal~@classloaderinthemiddle. #todo[update ref when not 'just published' anymore]
+ We propose an approach to allow static analysis tools to analyse application that perform dynamic code loading:
We collect at runtime the bytecode dynamically loaded and the reflection calls informations, an patch the #APK file to perform those operation statically.
Finally, we evaluate the impact this transformation has on the #jm-note[resiliance][wrong word?] of the tools we containerized previously.
== Outline
This dissertation is composed of 6 chapters.
This introduction is the first chapter.
It is followed by @sec:bg that gives background information about Android and the different analysis techniques targetting Android applications.
The next 3 chapters are dedicated to the contributions of this thesis.
First @sec:rasta studdies the reusability of static analysis tools.
Next in @sec:cl, we model the default class loading algorithm used by Android and the show the consequences for reverse engineering tools that implement a wrong model.
Then @sec:th presents an approach that allows for static analysis tools to analyse applications that load bytecode at runtime.
Finally, @sec:conclusion summarizes the contributions of this thesis and opens perspectives for futur work.
]