wip
Some checks failed
/ test_checkout (push) Failing after 1s

This commit is contained in:
Jean-Marie Mineau 2025-08-17 23:35:07 +02:00
parent 25c79da4f9
commit 021ac36e73
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
15 changed files with 110 additions and 75 deletions

View file

@ -1,4 +1,5 @@
#import "../lib.typ": todo, epigraph, eg, APK, API, highlight-block, jm-note, pb1-text, pb2-text, pb3-text
#import "../lib.typ": epigraph, eg, APK, API, highlight-block, pb1-text, pb2-text, pb3-text
#import "../lib.typ": todo, jfl-note, jm-note
= Introduction <sec:intro>
@ -8,88 +9,99 @@
// De tout temps les hommes on fait des apps android ...
Android is the most used mobile operating system since 2014, and since 2017, it even surpasses Windows all platforms combined#footnote[https://gs.statcounter.com/os-market-share#monthly-200901-202304].
The public adoption of Android is confirmed by application developers, with 1.3 millions apps available in the Google Play Store in 2014, and 3.5 millions apps available in 2017#footnote[https://www.statista.com/statistics/266210].
The public adoption of Android is confirmed by application developers, with 1.3 million apps available in the Google Play Store in 2014, and 3.5 million apps available in 2017#footnote[https://www.statista.com/statistics/266210].
Its popularity makes Android a prime target for malware developers.
Various applications have been shown to behave maliciously, from stealing personal informations~@shanSelfhidingBehaviorAndroid2018 to hijacking the phone computing ressources to mine cryptocurrency~@adjibi_devil_2022.
Indeed, various applications have been shown to behave maliciously, from stealing personal informations~@shanSelfhidingBehaviorAndroid2018 to hijacking the smartphone's computing resources to mine cryptocurrency~@adjibi_devil_2022.
Considering the importance of Android in the everyday live of so many people, Google, the company that develops Android, defined a very strong security model that addresses an extensive threat model~@mayrhofer_android_2021.
This threat model goes as far as to consider that an adversarie can have physical access to an unlocked device (#eg an abusive partner, or a border control). // Americaaaaa
On the device, this security model imply the sandboxing of each applications, with a system of permissions to allow the applications to perform potentially unwanted actions.
For example, an applications cannot access the contact list without requesting the permission to the user first.
Android keep improving its security version from version, be it by improving the sandboxing (#eg starting with Android 10, application can no longer access the clipboard if they are not focused) or safer default (#eg since Android 9, by default, all network connection must use TLS).
Considering the importance of Android in the everyday life of so many people, Google, the company that develops Android, defined a very strong security model that addresses an extensive threat model~@mayrhofer_android_2021.
This threat model goes as far as to consider that an adversary can have physical access to an unlocked device (#eg an abusive partner, or a border control). // Americaaaaa
On the device, this security model includes the sandboxing of each application, controlled using a system of permissions to allow the applications to perform potentially unwanted actions.
For example, an application cannot access the contact list without requesting permission from the user first.
Android keeps improving its security from version to version by improving the sandboxing (#eg starting with Android 10, applications can no longer access the clipboard if they are not focused) or by using safer defaults (#eg since Android 9, by default, all network connections must use TLS).
// Android Bouncer, ca marche pas tres bien quand même ect ect (stralker ware?)
In the spirit of _defence in depth_, Google develloped a _Bouncer_ service that scan applications in the store for malicious software#footnote[https://googlemobile.blogspot.com/2012/02/android-and-security.html].
Although its operating is kept secret, it seems that the Bouncer is both comparing the applications with known malware code and running the applications in Google's cloud infrastructure to detect hidden behavior.
In the spirit of _defence in depth_, Google developed a _Bouncer_ service that scans applications in the store for malicious software#footnote[https://googlemobile.blogspot.com/2012/02/android-and-security.html].
Although its #jm-note[operation][I would have said "operating" but grammarly disagrees] is kept secret, it seems that the Bouncer is both comparing the applications with known malware code and running the applications in Google's cloud infrastructure to detect hidden behavior.
Despite Google's efforts, malicious applications are still found in the Play Store~@adjibi_devil_2022.
Also, it is not uncommmon for people in abusive situation to have their abuser install on their phone a stalkerware (spying application) found outside of the Play Store~@stateofstalkerware.
Also, it is not uncommon for people in abusive situations #jfl-note[to have their abuser install][jfl says "install#strong[ing]", jm says no, grammarly is on the side of jm] on their phone a stalkerware (spying application) found outside of the Play Store~@stateofstalkerware.
For this reasons, it is important to be able to analyse an application and understand was it does.
For these reasons, it is important to be able to analyse an application and understand what it does.
This process is called reverse engineering.
A lot of work has been done to reverse engineering computer software, but Android applications come with specific challenges that need to be address.
For instance, Android application have a distributed in a specific file format, the #APK format, and the code of the application is mainly compile into an Android specific bytecode: Dalvik.
An Android reverse engineer will need tools that can read those Android specific formats.
A lot of work has been done to reverse engineer computer software, but Android applications come with specific challenges that need to be addressed.
For instance, Android applications are distributed in a specific file format, the #APK format, and the code of the application is mainly compiled into an Android-specific bytecode: Dalvik.
An Android reverse engineer will need tools that can read those Android-specific formats.
A first test in the process of reverse engineering an application would be to simply read the content of the application and the code in it.
Tools like apktool can be used to convert the binary files of an application in a human readable format.
Other tools like Jadx can go farther and try to generate Java code from the bytecode in the application.
Because Android applications tend to be massive, it can be quite tedious to understand what it doest juste from reading its bytecode.
To help, many tools/approches have been developed~@Li2017 @sutter_dynamic_2024.
For example, Flowdroid~@Arzt2014a aims to detect information leak: given a set of methods that can generate private information, and a set of methods that send information to the outside, Flowdroid will detect if private information is send to the outside.
Once again, those kind of tools need to target Android specifically.
Android run its applications code differently very than a computer would run software.
One example would be entry points: computer software usually have one starting point, when Android applications have many, that Android will chose depending on the situation.
Unfortunately, those tools are hard to use, and even when the work on small example application, it is not uncommon for them to fail to run on real-live applications~@reaves_droid_2016.
Tools like Apktool can be used to convert the binary files of an application into a human-readable format.
Other tools like Jadx can go further and try to generate Java code from the bytecode in the application.
Because Android applications tend to be quite large, it can be quite tedious to understand what it does just from reading its bytecode.
To address this issue, many tools/approaches have been developed~@Li2017 @sutter_dynamic_2024 to extract higher-level information about the behavior of the application without having to manually analyse the application.
For example, Flowdroid~@Arzt2014a aims to detect information leaks: given a set of methods that can generate private information, and a set of methods that send information to the outside, Flowdroid will detect if private information is sent to the outside.
Once again, those kinds of tools need to target Android specifically.
Android runs its applications code differently than a computer would run software.
One example would be the handling of entry points: computer software usually has one entry point, whereas Android applications have many, and Android will choose depending on context.
Unfortunately, those tools are hard to use, and even when they work on small example applications, it is not uncommon for them to fail to run on real-life applications~@reaves_droid_2016.
This is worrying.
Android applications are becoming more complexe every years and tools that cannot handle this complexity only fail more often.
Android applications are becoming more complex every year, and tools that cannot handle this complexity will fail more often.
This leads us to our first problem statement:
// Chiffrer les contrib avec des xp qui ignore les app qui font crasher les outils?
#highlight-block(breakable: false)[
*Pb1*: #pb1-text
Many tools have been published to analyse Android applications, but the Android ecosystem is fast evolving.
Many tools have been published to analyse Android applications, but the Android ecosystem is evolving rapidly.
Tools developed 5 years ago might not be usable anymore.
We will endeavor to identify which tools are still usable today, and for the others, what causes them to no longer be an option.
] <pb-1>
Another issue is that Android application developpers sometime use various techniques to slow down reverse engineering.
This process called obfuscation.
Malware developpers do that to hide malicious behavior and avoid detection, but the use of obfuscation is not a proof that and application is malicious.
Indeed, legitimate applications developpers can also use obfuscation to protect their intellectual property. // burrkkk
Thus, developpers and reverse engineers are playing a game of cats and mouse, constantly inventing new technique to hide or reveal the behavior of an application.
Another issue is that Android application developers sometimes use various techniques to slow down reverse engineering.
This process is called obfuscation.
Malware developers do that to hide malicious behavior and avoid detection, but the use of obfuscation is not proof that an application is malicious.
Indeed, legitimate application developers can also use obfuscation to protect their intellectual property. // burrkkk
Thus, developers and reverse engineers are playing a game of cat and mouse, constantly inventing new techniques to hide or reveal the behavior of an application.
They are two types of reverse engineering: static and dynamic.
Static analysis consists of examining the application without running it, while dynamic analysis studdy the action of the application durring its run.
There are two types of reverse engineering techniques: static and dynamic.
Static analysis consists #jfl-note[of][jfl asks "in"?\ grammarly says "of"] examining the application without running it, while dynamic analysis studies the action of the application while it is running.
Both methods have their drawbacks, and techniques will often capitalyse on the drawbacks of one of those methods.
For instance, an application can try to detect if it is running in a sandbox environment and not act maliciously if it is the case.
Similarly, an application can dynamicaly load bytecode at runtime, and this bytecode will not be available during a static analysis.
Dynamic code loading rely on Java classes called `ClassLoader` that are central components of the Android runtime environment.
Because dynamic code loading is such a difficult probleme for static analysis, dynamic class loading is often ignore when doing static analysis.
Similarly, an application can dynamically load bytecode at runtime, and this bytecode will not be available during a static analysis.
Dynamic code loading relies on Java classes called `ClassLoader` that are central components of the Android runtime environment.
Because dynamic code loading is such a difficult problem for static analysis, dynamic class loading is often ignored when doing static analysis.
However, class loading is not limited to dynamic code loading.
In fact, the Android Runtime is constantly performing class loading to load classes from the application of from the Android platform itself.
As a matter of fact, the Android Runtime is constantly performing class loading to load classes from the application or from the Android platform itself.
This blind spot in static analysis tools raises our second problem statement:
#highlight-block(breakable: false)[
*Pb2*: #pb2-text
Class loading is an operation often ignored in static analysis.
Class loading is an operation often ignored by static analysis tools.
The exact algorithm used is not well known and might not be accurately modeled by static analysis tools.
If it is the case, discrepancies between the model of the tools and the one used by Android could be used as a base for new obfuscation techniques.
] <pb-2>
#jfl-note[
Reflection is another common obfuscation technique against static analysis.
Instead of directly invoking methods, the generic `Method.invoke()` #API is used, and the method is retrieved from its name in the form of a character string.
Finding the value of this string can be quite difficult to determine statically, so it is once again an issue more suitable for dynamic analysis.
A reverse engineer can obtain the relevant information with dynamic analysing, but there is no standard way to make static analysis tools aware of it.
This lead us to our last problem statement:
When encountering a complex case of reflection (#ie using ciphered strings) or code loading, a reverse engineer will switch to dynamic analysis to collect the relevant data (the name of the methods called or the code that was loaded), then switch back to static analysis.
This is doable for a manual analysis; unfortunately, the more automated tools that would require that runtime information to perform an accurate analysis may not have a way to access this new data.
This led us to our last problem statement:
][
Peu developpé.
Expliquer qu'un reverser, s'il trouve de la reflection ou du dyn load peut eventuellement capturer les données en analyse dynamique.
Mais ensuite ces données devienent inutiles s'il retourne a de l'analyse static.
En effet, il fait souvant les deux en alternances.
Il avait besoin que les data issues de l'analyse dyn soient prisent en compte par l'analyse statique, par example...
TODO: trouver un example simple a formuler
]
#highlight-block(breakable: false)[
*Pb3*: #pb3-text
Dynamic code loading and reflection are problems most suited for dynamic analysis.
However, static analysis tools do not have access to collected data.
Encoding this information inside valid applications could be a way to make it universally available to any static analysis tool.
#todo[say something about the impact that can have on tools?]
Ideally, this encoding should not degrade the quality of the static analysis compared to the original application.
] <pb-3>
#[
@ -100,29 +112,29 @@ This lead us to our last problem statement:
The contributions of this thesis are the following:
+ We evaluate the reusability of Android static analysis tools published by the community:
We rebuild the tools in their original environment as container images.
With those containers, those tools are now readilly available capable of running either Docker of Singularity.
We also tested those tools on a dataset of real-life applications balanced in order to have a significant number of applications with different caracteristics to assess which caracteristic impact the success of a tools.
we rebuild the tools in their original environment as container images.
With those containers, those tools are now readily available on any environment capable of running either Docker or Singularity.
We tested those tools on a dataset of real-life applications balanced in order to have a significant number of applications with different characteristics to assess which characteristics impact the success of a tool.
This work was presented at the ICSR 2024 conference~@rasta.
+ We model the default class loading behavior of Android.
Based on this model, we defined a class of obfuscation technique that we called _shadow attacks_ where an class definition in an #APK shadows the actual class definition.
We show that common state of the arts tools like Jadx or Flowdroid do not implement this model correctly and thus can fall for those shadow attacks.
We surveilled a large number of rescent Android applications and found that applications with classes shadowing the actual definition do exists, those are the result of quirks in the #APK compilation process and not deliberate obfuscation attempts.
This work was publish in the Digital Threats journal~@classloaderinthemiddle. #todo[update ref when not 'just published' anymore]
+ We propose an approach to allow static analysis tools to analyse application that perform dynamic code loading:
We collect at runtime the bytecode dynamically loaded and the reflection calls informations, an patch the #APK file to perform those operation statically.
Finally, we evaluate the impact this transformation has on the #jm-note[resiliance][wrong word?] of the tools we containerized previously.
Based on this model, we define a class of obfuscation techniques that we call _shadow attacks_ where a class definition in an #APK shadows the actual class definition.
We show that common state-of-the-art tools like Jadx or Flowdroid do not implement this model correctly and thus can fall for those shadow attacks.
We analysed a large number of recent Android applications and found that applications with class shadowing do exist, though they are the result of quirks in the #APK compilation process and not deliberate obfuscation attempts.
This work was published in the Digital Threats journal~@classloaderinthemiddle. #todo[update ref when not 'just published' anymore]
+ We propose an approach to allow static analysis tools to analyse applications that perform dynamic code loading:
We collect at runtime the bytecode dynamically loaded and the reflection calls information, and patch the #APK file to perform those operations statically.
Finally, we evaluate the impact this transformation has on the tools we containerized previously.
== Outline
This dissertation is composed of 6 chapters.
This introduction is the first chapter.
It is followed by @sec:bg that gives background information about Android and the different analysis techniques targetting Android applications.
It is followed by @sec:bg which gives background information about Android and the different analysis techniques targeting Android applications.
The next 3 chapters are dedicated to the contributions of this thesis.
First @sec:rasta studdies the reusability of static analysis tools.
Next in @sec:cl, we model the default class loading algorithm used by Android and the show the consequences for reverse engineering tools that implement a wrong model.
First @sec:rasta studies the reusability of static analysis tools.
Next in @sec:cl, we model the default class loading algorithm used by Android and show the consequences for reverse engineering tools that implement a wrong model.
Then @sec:th presents an approach that allows for static analysis tools to analyse applications that load bytecode at runtime.
Finally, @sec:conclusion summarizes the contributions of this thesis and opens perspectives for futur work.
Finally, @sec:conclusion summarizes the contributions of this thesis and opens perspectives for future work.
]