correction background
Some checks failed
/ test_checkout (push) Failing after 22s

This commit is contained in:
Jean-Marie Mineau 2025-08-06 00:25:42 +02:00
parent f5145237ce
commit 2e52599a7c
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
7 changed files with 92 additions and 71 deletions

View file

@ -11,6 +11,7 @@
#let NDK = link(<acr-ndk>)[NDK]
#let SDK = link(<acr-sdk>)[SDK]
#let XML = link(<acr-xml>)[XML]
#let ZIP = link(<acr-zip>)[ZIP]
#let notation_table = align(center, table(
columns: 2,
@ -19,7 +20,7 @@
[Acronyms], [Meanings],
),
ADB, [Android Debug Bridge, a tool to connect to an Android emulator of smartphone to run commands, start applications, send events and perform other operations for testing and debuging purpose <acr-adb>],
API, [Application Programming Interface, in the Android echosystem, it is a set of classes with known method signatures that can be called by an application to interact with the Android framework <acr-api>],
API, [Application Programming Interface, in the Android ecosystem, it is a set of classes with known method signatures that can be called by an application to interact with the Android framework <acr-api>],
APK, [Android Package, the file format used to install application on Android. The APK format is an extention of the #JAR format <acr-apk>],
ART, [Android RunTime, the runtime environement that execute an Android application. The ART is the successor of the older Dalvik Virtual Machine <acr-art>],
AXML, [Android #XML. The specific flavor of #XML used by Android. The main specificity of AXML is that it can be compile in a binary version inside an APK <acr-axml>],
@ -31,4 +32,5 @@
NDK, [Native Development Kit, the set of tools used to build C and C++ code for Android <acr-ndk>],
SDK, [Software Development Kit, a set of tools for developing software targeting a specific platform. In the context of Android, the version of the SDK can be associated to a version of Android, and application compatibility is defined in term of compatible SDK version <acr-sdk>],
XML, [eXtensible Markup Language, a language to store data <acr-xml>],
ZIP, [ZIP is an archive format. A ZIP file contains other files, that may be compressed <acr-zip>],
))

3
2_background/0_intro.typ Normal file
View file

@ -0,0 +1,3 @@
#import "../lib.typ": todo
#todo[Intro: on parle de reverse des app, donc on parle des apps, techniques de reverse, outils, dataset, et justifier de pourquoi on en parle]

View file

@ -1,20 +1,21 @@
#import "../lib.typ": todo, APK, JAR, AXML, ART, SDK, JNI, NDK, DEX, XML, API
#import "../lib.typ": todo, num, APK, JAR, AXML, ART, SDK, JNI, NDK, DEX, XML, API, ZIP, jfl-note
== Android <sec:bg-android>
Android is the smartphone operating system develloped by Google.
It is based on a Long Term Support Linux Kernel, to which patches are added patches develloped by the Android community.
It is based on a Long Term Support Linux Kernel, to which are added patches develloped by the Android community.
On top of the kernel, Android redeveloped many of the usual components used by linux-based operating systems, and added new ones.
Those change make Android a verry unique operating system.
#jfl-note[][Chiffres pour illustrer?]
=== Android Applications <sec:bg-android>
Application in the Android ecosystem are distributed in the #APK format.
#APK files are #JAR files with additionnal features, which are themself ZIP files with additionnal features.
#APK files are #JAR files with additionnal features, which are themself #ZIP files with additionnal features.
A minimal #APK file is a ZIP archive containing a file `AndroidManifest.xml`, the `META-INF/` folder containing the #JAR manifest and signature files, and an #APK Signing Block at the end of the ZIP file.
Other files are then added.
Dalvik bytecode is stored in the `classes.dex`, `classes2.dex`, `classes3.dex`, ... and native code is stored in `lib/<arch>/*.so`.
A minimal #APK file contains a file `AndroidManifest.xml`, the `META-INF/` folder containing the #JAR manifest and signature files, and an #APK Signing Block at the end of the #ZIP file.
The code of the application is then store in a custom format, the Dalvik bytecode, or in the binary ELF format, called native code in the Android ecosystem, or both.
Dalvik bytecode is stored in the `classes.dex`, `classes2.dex`, `classes3.dex`, ... while native code is stored in `lib/<arch>/*.so`.
The `res/` folder contains the ressources required for the user interface.
When ressources are present in `res/`, the file `resources.arsc` is also present at the root of the archive.
The `assets/` folder contains the files that are used directly by the code application.
@ -23,15 +24,15 @@ Depending on the application and compilation process, any kind of other files an
==== Signature
Android applications are cryptographically signed to prove the autorship.
Applicatations signed with same key are considered develloped by the same entity.
This allow to securelly update applications, and application can declare security permission to restrict access to some feature to only application with the same author.
Applicatations signed with same key are considered developed by the same entity.
This allow to securely update applications, and applications can declare security permission to restrict access to some feature to only application with the same author.
Android has several signature schemes coexisting:
- The v1 signature scheme is the #JAR signing scheme, where the signature data is stored in the `META-INF/` folder.
- The v2, v3 and v3.1 signature scheme are store in the '#APK Signing Block' of the #APK.
The v2 signature scheme was introduce in Android 7.0, and to keep retrocompatibility with older version, the v1 scheme is still used in addition to the #APK Signing Block.
The Signing block is an unindexed binary section added to the ZIP file, between the ZIP entries and the Central Directory.
The signature was added in an unindexed section of the ZIP to avoid interferring with the v1 signature scheme that sign the files inside the archive, and not the archive itself.
The v2 signature scheme was introduced in Android 7.0, and to keep retrocompatibility with older version, the v1 scheme is still used in addition to the #APK Signing Block.
The Signing block is an unindexed binary section added to the #ZIP file, between the #ZIP entries and the Central Directory.
The signature was added in an unindexed section of the #ZIP to avoid interferring with the v1 signature scheme that sign the files inside the archive, and not the archive itself.
- The v4 signature scheme is complementary to the v2/v3 signature scheme.
Signature data are stored in an external, `.apk.idsig` file.
@ -39,22 +40,23 @@ Android has several signature schemes coexisting:
The Android Manifest is stored in the `AndroidManifest.xml`, encoded in the binary #AXML format.
The manifest declare important informations about the application:
- generic informations like the application name, id, icon
- The Android compatibility of the applications, in the form of 3 values: the Android `min-sdk`, `target-sdk` and `max-sdk`. Those are the minimum, targeted and maximum version of the Android SDK supported by the application
- The application componants (Activity, Service, Receiver and Provider) of the application and the classes they are associated to
- Intent filters to list the itents that can start or be sent to the application componants
- Security permissions required by the application
- Generic informations like the application name, id, icon.
- The Android compatibility of the applications, in the form of 3 values: the Android `min-sdk`, `target-sdk` and `max-sdk`. Those are the minimum, targeted and maximum version of the Android SDK supported by the application.
- The application componants (Activity, Service, Receiver and Provider) of the application and their associated classes.
- Intent filters to list the intents that can start or be sent to the application componants.
- Security permissions required by the application.
==== Code
==== Code <sec:bg-android-code-format>
An application usually contains at least a `classes.dex` file containing Dalvik bytecode.
This is the format executed by the Android #ART.
It is common for an application to have more thant one #DEX file, when application need to reference more methods than the format allows in one file.
It is common for an application to have more thant one #DEX file, when application need to reference more methods than the format allows in one file
(each method referenced inside a #DEX is associated to a 16 bits number, limiting their number to #num(65536)).
Support for multiple #DEX files was added in the #SDK 21 version of Android, and applications that have multiple #DEX file are sometimes refered to as 'multi-dex'.
In addition to #DEX files, and sometimes instead of #DEX files, applications can contain `.so` ELF (Executable and Linkable Format) files in the `lib/` folder.
In the Android echosystem, binary code is called native code.
Because native code is compile for a specific architecture, `.so` files are present in different versions, stored in different subfolders, depending on the targetted architecture.
In the Android ecosystem, binary code is called native code.
Because native code is compiled for a specific architecture, `.so` files are present in different versions, stored in different subfolders, depending on the targetted architecture.
For example `lib/arm64-v8a/libexample.so` is the version of the `example` library compiled for an ARM 64 architecture.
Because smartphones mostly use ARM processors, it is not rare to see applications that only have the ARM version of their native code.
@ -77,28 +79,28 @@ The source code is then compile.
The most common programming langages used for Android application are Java and Kotlin.
Both are first compiled to java bytecode in `.class` files using the langage compiler.
To allow access to the Android #API, the `.class` are linked during the compilation to an `android.jar` file that contains classes with the same signatures as the one in the Android #API for the targeted SDK.
The `.class` files are the converted to #DEX files using `d8`.
During those steeps, both the original langage compiler and `d8` can perform optimizations on the classes.
The `.class` files are then converted into the #DEX format using `d8`.
During those steeps, both the original langage compiler and `d8` can perform optimizations on the classes, like code shrinking, inlining, etc.
If the application contains native code, the original C or C++ code is compile using tools Android #NDK to target the different architecture target.
If the application contains native code, the original C or C++ code is compile using tools Android from the #NDK to target the different possible architectures.
`aapt` is then used once again to package all the generated #AXML, #DEX, `.so` files, as well as the other ressources files, assets, `resources.arsc`, and any additionnal files deemed necessary in ZIP file.
`aapt` ensures that the generated ZIP is compatible with the requirement from Android.
For instance, the `resources.arsc` will be mapped directly in memory at runtime, so it must not be compressed inside the ZIP file.
`aapt` is then used once again to package all the generated #AXML, #DEX, `.so` files, as well as the other ressources files, assets, `resources.arsc`, and any additionnal files deemed necessary to form the final #ZIP file.
`aapt` ensures that the generated #ZIP is compatible with the requirement from Android.
For instance, the `resources.arsc` will be mapped directly in memory at runtime, so it must not be compressed inside the #ZIP file.
If necessary, the ZIP file is then aligned using `zipalign`.
Again, this is to ensure compatibility with android optimizations: files like `resources.arsc` need to be 4 bits alligned to be mapped in memory.
If necessary, the #ZIP file is then aligned using `zipalign`.
Again, this is to ensure compatibility with android optimizations: some files like `resources.arsc` need to be 4 bits alligned to be mapped in memory.
The last step is to sign the application using the `apksigner` utility.
Since 2021, Google require that new applications in the Google Play app store to be uploaded in a new format called Android App Bundles.
Since 2021, Google requires that new applications in the Google Play app store to be uploaded in a new format called Android App Bundles.
The main difference is that Google will perform the last packaging steps and generate (and sign) the application itself.
This allow Google to generate different applications for different target, and avoid including unnecessary files in the application like native code targetting the wrong architecture.
=== Android Runtime <sec:bg-art>
Android runtime environement has many specificities that sets it appart from other platforms.
An heavy heavy empasis is put on isolating the applications from one another as well from the systems critical capabilities.
An heavy emphasis is put on isolating the applications from one another as well from the systems critical capabilities.
The code execution itself can be confusing at first.
Instead of the usual linear model with a single entry point, applications have many entrypoints that are called by the Android framework in accordance to external events.
@ -106,23 +108,22 @@ Instead of the usual linear model with a single entry point, applications have m
#todo[Subsection name?]
Android application expose their componants to the Android Runtime (#ART) via classes inheriting specific classes from the Android SDK.
They are four type of application commponents, that serves as entry points for application.
Each has a class associated to it, and serves a different role.
Android application expose their componants to the Android Runtime (#ART) via classes inheriting specific classes from the Android #SDK.
Four classes represent application components that can be used as entry points:
/ Activities: An activity represent a single screen with a user interface. This is the componant used to interact with a user.
/ Activities: An activity represent a single screen with a user interface. This is the component used to interact with a user.
/ Services: A service serves as en entrypoint to run the application in the background.
/ Broadcast receivers: A broadcast receiver is an entry point used when a matching event is broadcasted by the system. They allow to application responce to event event when not started.
/ Content providers: A content provider is a componant that manage data accessible by other app through the content provider.
/ Broadcast receivers: A broadcast receiver is an entry point used when a matching event is broadcasted by the system.
/ Content providers: A content provider is a component that manage data accessible by other app through the content provider.
Componant must be listed in the `AndroidManifest.xml` of the application so that the system nows of them.
In the course of a componant live cicle, the system will call specifics methods defined by the classes associated to each componant type.
Those methods are to be overrident by the classes defined in the application if they are specific action to be perfomed.
For instance, an activitymight compute some values in `onCreate()`, called when the activity is created, save the value of those variable to the file system in `onStop()`, called when the acitivity stop being visible to the user, and recover the saved values in `onRestart()`, called when the user navigate back to the activity.
Components must be listed in the `AndroidManifest.xml` of the application so that the system knows of them.
In the live cicle of a component, the system will call specific methods defined by the classes associated to each componant type.
Those methods are to be overridden by the classes defined in the application if they are specific action to be perfomed.
For instance, an activity might compute some values in `onCreate()`, called when the activity is created, save the value of those variable to the file system in `onStop()`, called when the acitivity stop being visible to the user, and recover the saved values in `onRestart()`, called when the user navigate back to the activity.
In addition to the componants declared in the manifest that act as entry points, the Android #API heavily relies on callbacks.
The most obvious cases are for the user interface, for example a button will call a callback method defined by the application when clicked.
Other part of the #API also rely on non-linear execution, for example when an application send an itent (see @sec:bg-sandbox), the intent sent in responce is transmitted to back to the application by calling another method.
Other part of the #API also rely on non-linear execution, for example when an application sends an intent (see @sec:bg-sandbox), the intent sent in responce is transmitted back to the application by calling another method.
==== Application Isolation and Interprocess Communication <sec:bg-sandbox>

View file

@ -1,9 +1,11 @@
#import "../lib.typ": todo, APK, IDE, SDK, DEX, ADB, ART, eg, XML, AXML, API
#import "../lib.typ": todo, APK, IDE, SDK, DEX, ADB, ART, eg, XML, AXML, API, jfl-note
== Android Reverse Engineering Tools <sec:bg-tools>
== Reverse Engineering Tools <sec:bg-tools>
Due to the specificities of Android, the usual tools for reverse engineering are not enough.
#todo[blabla intro in @sec:bg-tools]
Due to the specificities of Android, reverse engineers need tools adapted to Android.
The developement tools provided by Google can be used for basic operations.
Apktool and Jadx are common tools used to read the content of an application, meanwhile Androguard and Soot can be used as librairy to automate analysis.
For a more dynamic approach, Frida is a toolkit that can be use to intercept method call and execute custom while an application is running.
=== Android Studio <sec:bg-android-studio>
@ -24,28 +26,29 @@ Among the notable tools in the #SDK, they are:
- #ADB: a tool to send commands to Android smartphone or emulator.
It can be used to install applications, send instructions, events, and generally perform debuging operations.
- Platform Packages: Those packages contains data associated to a version of android needed to compile an application.
Especially, they contains the so call `android.jar` files.
Especially, they contains the so call `android.jar` files, that contains the list of #API for a version of Android.
- `d8`: The main use of `d8` is to convert java bytecode files (`.class`) to Android #DEX format.
It can also be used to perform different level of optimization of the bytecode generated.
- `aapt`/`aapt2` (Android Asset Packaging Tool): This tools is used to build the #APK file.
It is commonly used by other tools that repackage applications like Apktool.
Behind the scene, it we convert #XML to binary #AXML and ensure the right files have the right compression and alignment. (#eg some ressource files are mapped in memory by the #ART, and thus need to be aligned and not compressed).
- `apksigner`: the tool used to sign an #APK file.
When repackaging an application, for example with Apktool, the new application need to be signed.
=== Apktool <sec:bg-apktool>
Apktool#footnote[https://apktool.org/] is a _reengineering tool_ for Android #APK files.
It can be used to disassemble an application: it will extract the files from the #APK file, convert the binary #AXML to text #XML, and use smali/backsmali#footnote[https://github.com/JesusFreke/smali] to convert the #DEX files to smali, an assembler-like langage that match the Dalvik bytecode instructions.
The main strenght of Apktool is that after having disassemble an application, the content of the application can be edited and reassemble into a new #APK.
The main strenght of Apktool is that after having disassemble an application, the content of the application can be edited and reassemble into a new #APK. #jfl-note[limites? ca marche toujours?]
=== Androguard <sec:bg-androguard>
#todo[ref to androguard paper]
Androguard#footnote[https://github.com/androguard/androguard] is a python library for parsing and analysing #APK files.
Its main feature is disassembling #APK files.
Androguard#footnote[https://github.com/androguard/androguard]~@desnos:adnroguard:2011 is a python library for parsing and analysing #APK files.
#jfl-note[Its main feature is disassembling #APK files.][backend #sym.eq.not apktool?]
It can be used to automatically read Android manifests, ressources, and bytecode.
Contrary to Apktool, it can be used programatically, whithout parsing text files, to analyse the application, but it cannot repackage a modified application.
In addition, it can perform additionnal analysis, like computing a call graph or control flow graph.
=== Jadx <sec:bg-jadx>
@ -71,9 +74,16 @@ Fidra#footnote[https://frida.re/] is a dynamic intrumentation toolkit.
It allows the reverse engineer to inject and run javascript code inside a running application.
To instrument an application, the frida server must be running as root on the phone, or the frida librairy must be injected inside the #APK file before installing it.
Frida defines a javascript wrapper arround the Java Native Interface (JNI) used by native code to interact with Java classes and the Android i#API.
In addition to allowing interaction with Java objects from the application and the Android API, this wrapper provide the option to replace a method implementation by a javascript function (that itself can call the original method implementation if needed).
This make Frida a powerfull tool capable of collecting runtime informations or modifying the behavior of an application as needed.
Frida defines a javascript wrapper arround the Java Native Interface (JNI) used by native code to interact with Java classes and the Android #API.
In addition to allowing interaction with Java objects from the application and the Android API, this wrapper provides the option to replace a method implementation by a javascript function (that itself can call the original method implementation if needed).
This make Frida a powerful tool capable of collecting runtime informations or modifying the behavior of an application as needed.
The main drawback of using Frida is that it is a known tools easily detected by applications.
Malware might implement countermeasures that avoid running malicious payload in presence of Frida.
#v(2em)
Those tools are quite usefull for manual operations.
However, considering the complexity of modern Android applications, it might take a lot of work for a reverse engineer to analyse one application.
In the next section, we will see more advance techniques that have been developped to analyse Android applications.

View file

@ -3,7 +3,7 @@
== Android Reverse Engineering Techniques <sec:bg-techniques>
#todo[swap with tool section ?]
//#todo[swap with tool section ?]
In the past fifteen years, the research community released many tools to detect or analyze malicious behaviors in applications.
Two main approaches can be distinguished: static and dynamic analysis~@Li2017.
@ -24,8 +24,8 @@ The most basic form of control-flow analysis is to build a call graph.
A call graph is a graph where the nodes represent the methods in the application, and the edges reprensent calls from one method to another.
@fig:bg-fizzbuzz-cg-cfg b) show the call graph of the code in @fig:bg-fizzbuzz-cg-cfg a).
A more advance control-flow analysis consist in building the control-flow graph.
This times instead of methods, the nodes represent instructions, and the edges indicate which instruction can follow which instruction.
@fig:bg-fizzbuzz-cg-cfg c) represent the control-flow graph of @fig:bg-fizzbuzz-cg-cfg a), with code statement instead of bytecode instructions.
This time, instead of methods, the nodes represent instructions, and the edges indicate which instruction can follow which instruction.
@fig:bg-fizzbuzz-cg-cfg c) represents the control-flow graph of @fig:bg-fizzbuzz-cg-cfg a), with code statement instead of bytecode instructions.
#figure({
set align(center)
@ -115,29 +115,31 @@ This times instead of methods, the nodes represent instructions, and the edges i
Once the control-flow graph is computed, it can be used to compute data-flows.
Data-flow analysis, also called taint-tracking, allows to follow the flow of information in the application.
Be defining a list of methods and fields that can generate critical information (taint sources) and a list of method that can consume information (taint sink), taint-tracking allows to detect potential data leak (if a data flow link a taint source and a taint sink).
For example, `TelephonyManager.getImei()` is return an unique, persistent, device identifier.
This can be used to identify the user can cannot be changed if compromised.
Be defining a list of methods and fields that can generate critical information (taint sources) and a list of methods that can consume information (taint sink), taint-tracking allows to detect potential data leaks (if a data flow link a taint source and a taint sink).
For example, `TelephonyManager.getImei()` returns an unique, persistent, device identifier.
This can be used to identify the user, and it cannot be changed if #jfl-note[compromised][replace by: this imei is dislaxd (illisible) \ jm: ???].
This make `TelephonyManager.getImei()` a good candidate as a taint source.
On the other hand, `UrlRequest.start()` send a request to an external server, making it a taint sink.
If a data-flow is found linking `TelephonyManager.getImei()` to `UrlRequest.start()`, this means the application is potentially leaking a critical information to an external entity, a behavior that is probably not wanted by the user.
Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable source being Flowdroid~@Arzt2014a.
Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable tool being Flowdroid~@Arzt2014a.
#todo[Describe the different contributions in relations to the issues they tackle]
Static analysis is powerfull as it allows to detects unwanted behavior in an application even is the behavior does not manifest itself when running the application.
Hovewer, static analysis tools must overcom many challenges when analysing Android applications:
/ the Java object-oriented paradigm: A call to a method can in fact correspond to a call to any method overriding the original method in subclasses
/ the multiplicity of entry points: Each component of an application can be an entry point for the application
/ the event driven architecture: Methods of in the applications can be called in many different order depending on external events
/ the interleaving of native code and bytecode: Native code can be called from bytecode and vice versa, but tools often only handle one of those format
/ the potential dynamic code loading: And application can run code that was not orriginally in the application
/ the use of reflection: Methods can be called from their name as a string object, which is not necessary known statically
/ the continual evolution of Android: each new version brings new features that an analysis tools must be aware of
/ the Java object-oriented paradigm: A call to a method can in fact correspond to a call to any method overriding the original method in subclasses.
/ the multiplicity of entry points: Each component of an application can be an entry point for the application.
/ the event driven architecture: Methods of in the applications can be called when event occur, in unknown order.
/ the interleaving of native code and bytecode: Native code can be called from bytecode and vice versa, but tools often only handle one of those format.
/ the potential dynamic code loading: An application can run code that was not originally in the application.
/ the use of reflection: Methods can be called from their name as a string object, which is difficult to identify statically.
/ the continual evolution of Android: each new version of Android brings new features that an analysis tools must be aware of.
For instance, the multi-dex feature presented in @sec:bg-android-code-format was introduced in Android #SDK 21.
Tools unaware of this feature only analyse the `classes.dex` file an will ignore all other `classes<n>.dex` files.
The tools can share the backend used to interact with the bytecode.
#jfl-note[The tools can share the backend used to interact with the bytecode.
For example, Apktool is often called in a subprocess to extracte the bytecode, and the Soot framework is a commonly used both to analyse bytecode and modify it.
The most notable user of Soot is Flowdroid. #todo[formulation]
The most notable user of Soot is Flowdroid. #todo[formulation]][mettre ca a avant]
=== Dynamic Analysis <sec:bg-dynamic>

View file

@ -4,6 +4,7 @@
#epigraph("Alexis \"Lex\" Murphy, Jurassic Park")[This is a Unix system. I know this.]
#include("0_intro.typ")
#include("1_android.typ")
#include("2_tools.typ")
#include("3_analysis_techniques.typ")

View file

@ -100,6 +100,8 @@
#show table: set par(leading: 0.65em) if paper_draft
#todo[Normalize classloaders vs class loaders]
#todo[Normalize bullets/item: either end with a '.' or a ';']
#todo[footnote numbering]
#todo[redeveloper le future work des chapitres: expliquer le pb et expliquer dans quel direction le travail devrai être dirigé, direction technique]