keep refactoring
All checks were successful
/ test_checkout (push) Successful in 1m48s

This commit is contained in:
Jean-Marie Mineau 2025-09-24 17:19:23 +02:00
parent d1dba30426
commit 471a176683
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
16 changed files with 181 additions and 149 deletions

View file

@ -29,7 +29,6 @@ We will begin this chapter by a presentation of the bases of the Android ecosyst
The reader already familliar with Android reverse engineering might want to skip to @sec:bg-probl where we put our problem statements in perspective. The reader already familliar with Android reverse engineering might want to skip to @sec:bg-probl where we put our problem statements in perspective.
We will then examine the state of the art related to those problem statements @sec:bg-soa, and conclude this chapter in @sec:bg-conclusion. We will then examine the state of the art related to those problem statements @sec:bg-soa, and conclude this chapter in @sec:bg-conclusion.
#todo[bien dédier des sections/sous section aux 3 problemes]
#todo[synthese a la fin de chaque section soa des problemes] #todo[synthese a la fin de chaque section soa des problemes]
#todo[Problematique avant soa] #todo[Problematique avant soa]

View file

@ -1,4 +1,5 @@
#import "../lib.typ": todo, num, APK, JAR, AXML, ART, SDK, JNI, NDK, DEX, XML, API, ZIP, jfl-note #import "../lib.typ": eg, num, APK, JAR, AXML, ART, SDK, JNI, NDK, DEX, XML, API, ZIP, paragraph
#import "../lib.typ": todo, jfl-note
=== Android <sec:bg-android> === Android <sec:bg-android>
@ -7,7 +8,7 @@ It is based on a Long Term Support Linux Kernel, to which are added patches deve
On top of the kernel, Android redeveloped many of the usual components used by linux-based operating systems, like the init system or the standart C library, and added new ones, like the #ART that execute the applications. On top of the kernel, Android redeveloped many of the usual components used by linux-based operating systems, like the init system or the standart C library, and added new ones, like the #ART that execute the applications.
Those change make Android a verry unique operating system. Those change make Android a verry unique operating system.
==== Android Applications <sec:bg-android> ==== Android Applications <sec:bg-android-apk>
Application in the Android ecosystem are distributed in the #APK format. Application in the Android ecosystem are distributed in the #APK format.
#APK files are #JAR files with additionnal features, which are themself #ZIP files with additionnal features. #APK files are #JAR files with additionnal features, which are themself #ZIP files with additionnal features.
@ -20,8 +21,7 @@ When ressources are present in `res/`, the file `resources.arsc` is also present
The `assets/` folder contains the files that are used directly by the code application. The `assets/` folder contains the files that are used directly by the code application.
Depending on the application and compilation process, any kind of other files and folders can be added to the application. Depending on the application and compilation process, any kind of other files and folders can be added to the application.
===== Signature #paragraph[*Signature*][
Android applications are cryptographically signed to prove the autorship. Android applications are cryptographically signed to prove the autorship.
Applicatations signed with the same key are considered developed by the same entity. Applicatations signed with the same key are considered developed by the same entity.
This allow to securely update applications, and applications can declare security permission to restrict access to some feature to only application with the same author. This allow to securely update applications, and applications can declare security permission to restrict access to some feature to only application with the same author.
@ -34,9 +34,9 @@ Android has several signature schemes coexisting:
The signature was added in an unindexed section of the #ZIP to avoid interferring with the v1 signature scheme that sign the files inside the archive, and not the archive itself. The signature was added in an unindexed section of the #ZIP to avoid interferring with the v1 signature scheme that sign the files inside the archive, and not the archive itself.
- The v4 signature scheme is complementary to the v2/v3 signature scheme. - The v4 signature scheme is complementary to the v2/v3 signature scheme.
Signature data are stored in an external, `.apk.idsig` file. Signature data are stored in an external, `.apk.idsig` file.
]
===== Android Manifest #paragraph[*Android Manifest*][
The Android Manifest is stored in the `AndroidManifest.xml`, encoded in the binary #AXML format. The Android Manifest is stored in the `AndroidManifest.xml`, encoded in the binary #AXML format.
The manifest declare important informations about the application: The manifest declare important informations about the application:
- Generic informations like the application name, id, icon. - Generic informations like the application name, id, icon.
@ -44,9 +44,9 @@ The manifest declare important informations about the application:
- The application componants (Activity, Service, Receiver and Provider) of the application and their associated classes. - The application componants (Activity, Service, Receiver and Provider) of the application and their associated classes.
- Intent filters to list the intents that can start or be sent to the application componants. - Intent filters to list the intents that can start or be sent to the application componants.
- Security permissions required by the application. - Security permissions required by the application.
]
===== Code <sec:bg-android-code-format> #paragraph[*Code*][
An application usually contains at least a `classes.dex` file containing Dalvik bytecode. An application usually contains at least a `classes.dex` file containing Dalvik bytecode.
This is the format executed by the Android #ART. This is the format executed by the Android #ART.
It is common for an application to have more thant one #DEX file, when application need to reference more methods than the format allows in one file It is common for an application to have more thant one #DEX file, when application need to reference more methods than the format allows in one file
@ -58,16 +58,16 @@ In the Android ecosystem, binary code is called native code.
Because native code is compiled for a specific architecture, `.so` files are present in different versions, stored in different subfolders, depending on the targetted architecture. Because native code is compiled for a specific architecture, `.so` files are present in different versions, stored in different subfolders, depending on the targetted architecture.
For example `lib/arm64-v8a/libexample.so` is the version of the `example` library compiled for an ARM 64 architecture. For example `lib/arm64-v8a/libexample.so` is the version of the `example` library compiled for an ARM 64 architecture.
Because smartphones mostly use ARM processors, it is not rare to see applications that only have the ARM version of their native code. Because smartphones mostly use ARM processors, it is not rare to see applications that only have the ARM version of their native code.
]
===== Ressources #paragraph[*Ressources*][
Developing graphical interfaces for applications require many kind of specific assets, which are stored in `lib/`. Developing graphical interfaces for applications require many kind of specific assets, which are stored in `lib/`.
Those ressources include bitmap images, text, layout, etc. Those ressources include bitmap images, text, layout, etc.
Data like layout, color or text are stored in binary #AXML. Data like layout, color or text are stored in binary #AXML.
An additionnal file, `resources.arsc`, in a custom binary format, contains a list of the ressources names, ids, and their properties. An additionnal file, `resources.arsc`, in a custom binary format, contains a list of the ressources names, ids, and their properties.
]
===== Compilation Process #paragraph[*Compilation Process*][
For the developer, the compilation process is handled by Android Studio and is mostly transparent. For the developer, the compilation process is handled by Android Studio and is mostly transparent.
Behind the scene, Android Studio rely on Gradle to orchestrate the different compilation steps: Behind the scene, Android Studio rely on Gradle to orchestrate the different compilation steps:
@ -95,6 +95,7 @@ The last step is to sign the application using the `apksigner` utility.
Since 2021, Google requires that new applications in the Google Play app store to be uploaded in a new format called Android App Bundles. Since 2021, Google requires that new applications in the Google Play app store to be uploaded in a new format called Android App Bundles.
The main difference is that Google will perform the last packaging steps and generate (and sign) the application itself. The main difference is that Google will perform the last packaging steps and generate (and sign) the application itself.
This allow Google to generate different applications for different target, and avoid including unnecessary files in the application like native code targetting the wrong architecture. This allow Google to generate different applications for different target, and avoid including unnecessary files in the application like native code targetting the wrong architecture.
]
==== Android Runtime <sec:bg-art> ==== Android Runtime <sec:bg-art>
@ -103,15 +104,14 @@ An heavy emphasis is put on isolating the applications from one another as well
The code execution itself can be confusing at first. The code execution itself can be confusing at first.
Instead of the usual linear model with a single entry point, applications have many entrypoints that are called by the Android framework in accordance to external events. Instead of the usual linear model with a single entry point, applications have many entrypoints that are called by the Android framework in accordance to external events.
===== Application Architecture #paragraph[*Application Architecture*][
Android application expose their componants to the Android Runtime (#ART) via classes inheriting specific classes from the Android #SDK. Android application expose their componants to the Android Runtime (#ART) via classes inheriting specific classes from the Android #SDK.
Four classes represent application components that can be used as entry points: Four classes represent application components that can be used as entry points:
/ Activities: An activity represent a single screen with a user interface. This is the component used to interact with a user. - _Activities_: An activity represent a single screen with a user interface. This is the component used to interact with a user.
/ Services: A service serves as en entrypoint to run the application in the background. - _Services_: A service serves as en entrypoint to run the application in the background.
/ Broadcast receivers: A broadcast receiver is an entry point used when a matching event is broadcasted by the system. - _Broadcast receivers_: A broadcast receiver is an entry point used when a matching event is broadcasted by the system.
/ Content providers: A content provider is a component that manage data accessible by other app through the content provider. - _Content providers_: A content provider is a component that manage data accessible by other app through the content provider.
Components must be listed in the `AndroidManifest.xml` of the application so that the system knows of them. Components must be listed in the `AndroidManifest.xml` of the application so that the system knows of them.
In the live cicle of a component, the system will call specific methods defined by the classes associated to each componant type. In the live cicle of a component, the system will call specific methods defined by the classes associated to each componant type.
@ -120,10 +120,10 @@ For instance, an activity might compute some values in `onCreate()`, called when
In addition to the componants declared in the manifest that act as entry points, the Android #API heavily relies on callbacks. In addition to the componants declared in the manifest that act as entry points, the Android #API heavily relies on callbacks.
The most obvious cases are for the user interface, for example a button will call a callback method defined by the application when clicked. The most obvious cases are for the user interface, for example a button will call a callback method defined by the application when clicked.
Other part of the #API also rely on non-linear execution, for example when an application sends an intent (see @sec:bg-sandbox), the intent sent in responce is transmitted back to the application by calling another method. Other part of the #API also rely on non-linear execution, for example when an application sends an intent (see next paragraph), the intent sent in responce is transmitted back to the application by calling another method.
]
===== Application Isolation and Interprocess Communication <sec:bg-sandbox>
#paragraph[*Application Isolation and Interprocess Communication*][
On Android, each application has its own storage folders and the application processes are isolated from each other and from the hardware interfaces. On Android, each application has its own storage folders and the application processes are isolated from each other and from the hardware interfaces.
This sandboxing is done using Linux security features like group and user permissions, SELinux, and seccomp. This sandboxing is done using Linux security features like group and user permissions, SELinux, and seccomp.
The sandboxing is adjusted according to the permissions requested in the `AndroidManifest.xml` file of the applications. The sandboxing is adjusted according to the permissions requested in the `AndroidManifest.xml` file of the applications.
@ -139,9 +139,9 @@ For instance, the activities and services are started by receiving and intent, a
Intent can also be sent directly from Android to the application: when a user starts an application by tapping the app icons, Android will send an intent to the class of the application that defined the intent filter for the `android.intent.action.MAIN` intent. Intent can also be sent directly from Android to the application: when a user starts an application by tapping the app icons, Android will send an intent to the class of the application that defined the intent filter for the `android.intent.action.MAIN` intent.
One interesting feature of the Binder is that intent do not need to explicitly name the targetted application and class: intent can be implicit and request an action without knowing the exact application that will performed it. One interesting feature of the Binder is that intent do not need to explicitly name the targetted application and class: intent can be implicit and request an action without knowing the exact application that will performed it.
An example of this behaviour is when an application want to open a file: an `android.intent.action.VIEW` intent is sent with the file location and type, and Binder will find and start an application capable of viewing this file. An example of this behaviour is when an application want to open a file: an `android.intent.action.VIEW` intent is sent with the file location and type, and Binder will find and start an application capable of viewing this file.
]
===== Platform Classes <sec:bg-platform> #paragraph[*Platform Classes*][
In addition to the classes they include, Android applications have access to classes provided by Android, stored on the phone. In addition to the classes they include, Android applications have access to classes provided by Android, stored on the phone.
Those classes are called _platform classes_. Those classes are called _platform classes_.
They are devided between #SDK classes, and hidden #API. They are devided between #SDK classes, and hidden #API.
@ -152,9 +152,9 @@ The list of #SDK classes is available at compile time in the form of a `android.
On the opposite, hidden #API are undocumented methods used internally by the #ART. On the opposite, hidden #API are undocumented methods used internally by the #ART.
Still, they are loaded by the application and can be used by it. Still, they are loaded by the application and can be used by it.
]
===== Class Loading and Reflection #paragraph[*Class Loading and Reflection*][
Class loading is the mechanism used by Android to find and select the classes implementation when encontering a reference to a class. Class loading is the mechanism used by Android to find and select the classes implementation when encontering a reference to a class.
Android developers mainly use it to load bytecode dynamically from a source other than the application itself (#eg a file downloaded at runtime), using `ClassLoader` objects. Android developers mainly use it to load bytecode dynamically from a source other than the application itself (#eg a file downloaded at runtime), using `ClassLoader` objects.
`Class` objects are the retrieved from those class loaders using their name in the form of strings to identify them. `Class` objects are the retrieved from those class loaders using their name in the form of strings to identify them.
@ -163,7 +163,7 @@ The process of manipulating `Class` and `Methods` object instead of using byteco
Reflection is not limited to bytecode that has been dynamically loaded: it can be used for any class or method available to the application. Reflection is not limited to bytecode that has been dynamically loaded: it can be used for any class or method available to the application.
Because the `ClassLoader` object are only used when loading bytecode dynamically or when using reflection, it is often forgotten that the #ART uses class loaders constantly behind the scene, allowing classes from the application and platform classes to cohabit seamlessly. Because the `ClassLoader` object are only used when loading bytecode dynamically or when using reflection, it is often forgotten that the #ART uses class loaders constantly behind the scene, allowing classes from the application and platform classes to cohabit seamlessly.
]
#v(2em) #v(2em)

View file

@ -1,4 +1,5 @@
#import "../lib.typ": todo, APK, IDE, SDK, DEX, ADB, ART, eg, XML, AXML, API, jfl-note #import "../lib.typ": APK, IDE, SDK, DEX, ADB, ART, eg, XML, AXML, API, paragraph
#import "../lib.typ": jfl-note, todo
=== Reverse Engineering Tools <sec:bg-tools> === Reverse Engineering Tools <sec:bg-tools>
@ -13,8 +14,7 @@ This time, the application is executed and the analyst will scrutinise the behav
Frida is a good option to help this dynamic analysis, Frida is a good option to help this dynamic analysis,
It is a toolkit that can be use to intercept method call and execute custom while an application is running. It is a toolkit that can be use to intercept method call and execute custom while an application is running.
==== Android Studio <sec:bg-android-studio> #paragraph[*Android Studio*][
The whole Android developement ecosystem is packaged by Google in the #IDE Android Studio#footnote[https://developer.android.com/studio]. The whole Android developement ecosystem is packaged by Google in the #IDE Android Studio#footnote[https://developer.android.com/studio].
In practice, Android Studio is a source-code editor that wrap arround the different tools of the android #SDK. In practice, Android Studio is a source-code editor that wrap arround the different tools of the android #SDK.
The #SDK tools and packages can be installed manually with the `sdkmanager` tool. The #SDK tools and packages can be installed manually with the `sdkmanager` tool.
@ -40,15 +40,15 @@ Among the notable tools in the #SDK, they are:
Behind the scene, it converts #XML to binary #AXML and ensure that each files have the right compression and alignment. (#eg some ressource files are mapped in memory by the #ART, and thus need to be aligned and not compressed). Behind the scene, it converts #XML to binary #AXML and ensure that each files have the right compression and alignment. (#eg some ressource files are mapped in memory by the #ART, and thus need to be aligned and not compressed).
- `apksigner`: the tool used to sign an #APK file. - `apksigner`: the tool used to sign an #APK file.
When repackaging an application, for example with Apktool, the new application need to be signed. When repackaging an application, for example with Apktool, the new application need to be signed.
]
==== Apktool <sec:bg-apktool> #paragraph[*Apktool*][
Apktool#footnote[https://apktool.org/] is a _reengineering tool_ for Android #APK files. Apktool#footnote[https://apktool.org/] is a _reengineering tool_ for Android #APK files.
It can be used to disassemble an application: it will extract the files from the #APK file, convert the binary #AXML to text #XML, and use smali/backsmali#footnote[https://github.com/JesusFreke/smali] to convert the #DEX files to smali, an assembler-like langage that match the Dalvik bytecode instructions. It can be used to disassemble an application: it will extract the files from the #APK file, convert the binary #AXML to text #XML, and use smali/backsmali#footnote[https://github.com/JesusFreke/smali] to convert the #DEX files to smali, an assembler-like langage that match the Dalvik bytecode instructions.
The main strenght of Apktool is that after having disassemble an application, the content of the application can be edited and reassemble into a new #APK. #jfl-note[limites? ca marche toujours?] The main strenght of Apktool is that after having disassemble an application, the content of the application can be edited and reassemble into a new #APK. #jfl-note[limites? ca marche toujours?]
]
==== Androguard <sec:bg-androguard> #paragraph[*Androguard*][
Androguard#footnote[https://github.com/androguard/androguard]~@desnos:adnroguard:2011 is a python library for parsing and disassembling #APK files. Androguard#footnote[https://github.com/androguard/androguard]~@desnos:adnroguard:2011 is a python library for parsing and disassembling #APK files.
It can be used to automatically read Android manifests, ressources, and bytecode. It can be used to automatically read Android manifests, ressources, and bytecode.
Contrary to Apktool wich generate text files, it can be used as a library to programatically to analyse the application. Contrary to Apktool wich generate text files, it can be used as a library to programatically to analyse the application.
@ -56,16 +56,16 @@ However, contrary to Apktool, it cannot repackage a modified application.
In addition, it can perform additionnal analysis, like computing a call graph or control flow graph of the application. In addition, it can perform additionnal analysis, like computing a call graph or control flow graph of the application.
We will explain what are those graphs later in @sec:bg-static. We will explain what are those graphs later in @sec:bg-static.
]
==== Jadx <sec:bg-jadx> #paragraph[*Jadx*][
Jadx#footnote[https://github.com/skylot/jadx] is an application decompiler. Jadx#footnote[https://github.com/skylot/jadx] is an application decompiler.
It convert #DEX files to Java source code. It convert #DEX files to Java source code.
It is not always capable of decompiling all classes of an application, so it cannot be used to recompile a new application, but the code generated can be very helpful to reverse an application. It is not always capable of decompiling all classes of an application, so it cannot be used to recompile a new application, but the code generated can be very helpful to reverse an application.
In addition to decompilling #DEX files, Jadx can also decode Android manifests and application ressources. In addition to decompilling #DEX files, Jadx can also decode Android manifests and application ressources.
]
==== Soot <sec:bg-soot> #paragraph[*Soot*][
Soot#footnote[https://github.com/soot-oss/soot]~@Arzt2013 was originaly a Java optimization framework. Soot#footnote[https://github.com/soot-oss/soot]~@Arzt2013 was originaly a Java optimization framework.
It could leaft java bytecode to other intermediate representations that can could be optimized, then converted back to bytecode. It could leaft java bytecode to other intermediate representations that can could be optimized, then converted back to bytecode.
Because Dalvik bytecode and Java bytecode are equivalent, support for Android was added to Soot, and Soot features are now leveraged to analyse and modify Android applications. Because Dalvik bytecode and Java bytecode are equivalent, support for Android was added to Soot, and Soot features are now leveraged to analyse and modify Android applications.
@ -73,9 +73,9 @@ One of the best known example of Soot usage for Android analysis is Flowdroid~@A
A new version of Soot, SootUp#footnote[https://github.com/soot-oss/SootUp], is currently beeing worked on. A new version of Soot, SootUp#footnote[https://github.com/soot-oss/SootUp], is currently beeing worked on.
Compared to Soot, it has a modernize interface and architecture, but it is not yet feature complete and some tools like Flowdroid are still using Soot. Compared to Soot, it has a modernize interface and architecture, but it is not yet feature complete and some tools like Flowdroid are still using Soot.
]
==== Frida <sec:bg-frida> #paragraph[*Frida*][
Frida#footnote[https://frida.re/] is a dynamic intrumentation toolkit. Frida#footnote[https://frida.re/] is a dynamic intrumentation toolkit.
It allows the reverse engineer to inject and run javascript code inside a running application. It allows the reverse engineer to inject and run javascript code inside a running application.
@ -86,6 +86,7 @@ This make Frida a powerful tool capable of collecting runtime informations or mo
The main drawback of using Frida is that it is a known tools easily detected by applications. The main drawback of using Frida is that it is a known tools easily detected by applications.
Malware might implement countermeasures that avoid running malicious payload in presence of Frida. Malware might implement countermeasures that avoid running malicious payload in presence of Frida.
]
#v(2em) #v(2em)

View file

@ -129,7 +129,11 @@ Hovewer, static analysis tools must overcom many challenges when analysing Andro
/ the potential dynamic code loading: An application can run code that was not originally in the application. / the potential dynamic code loading: An application can run code that was not originally in the application.
/ the use of reflection: Methods can be called from their name as a string object, which is difficult to identify statically. / the use of reflection: Methods can be called from their name as a string object, which is difficult to identify statically.
/ the continual evolution of Android: each new version of Android brings new features that an analysis tools must be aware of. / the continual evolution of Android: each new version of Android brings new features that an analysis tools must be aware of.
For instance, the multi-dex feature presented in @sec:bg-android-code-format was introduced in Android #SDK 21. For instance, the multi-dex feature presented in @sec:bg-android-apk was introduced in Android #SDK 21.
Tools unaware of this feature only analyse the `classes.dex` file an will ignore all other `classes<n>.dex` files. Tools unaware of this feature only analyse the `classes.dex` file an will ignore all other `classes<n>.dex` files.
#todo[Ca serait bien de souligner Dyn Code Load et Reflection] #todo[Ca serait bien de souligner Dyn Code Load et Reflection]
#v(2em)
With the bases of Android application analysis in mind, we can now examine our problem statements further.

View file

@ -4,6 +4,6 @@
#todo[Intro] #todo[Intro]
#import("2_1_android.typ") #include("2_1_android.typ")
#import("2_2_tools.typ") #include("2_2_tools.typ")
#import("2_3_static_analysis.typ") #include("2_3_static_analysis.typ")

View file

@ -1,8 +1,29 @@
#import "../lib.typ": pb1, pb1-text, pb2, pb2-text, pb3, pb3-text, ART
#import "../lib.typ": todo #import "../lib.typ": todo
== PB <sec:bg-probl> == Problems of the Reverse Engineer <sec:bg-probl>
In this section, we will develop some issues encontered by reverse engineer, and link them to our problem statements.
In the previous section, we listed some limitations to static analysis.
Some limitations have been known for some time now, and many contributions have been made to been made to overcome them.
Those contribution often introduce new tools that implements solutions to those different issues.
Depending on the situation, a reverse engineer might want to use those tools, or build another tool on top of one.
Unfortunately, they can be hard to use.
And like we said previously, the fast evolution of Android can be a significant obstacle.
The combinaison of those two point can lead a reverse engineer to spend a lot of time trying to use a tool without realising that tools does not work anymore.
Our first problem statement #pb1 focuses on this issue: #pb1-text.
Determining which tools are still usable today is a first step, but finding out what reasons make a tool stop working might help writing more resilient tools in the futur.
We also presented dynamic code loading an obstacle for static analysis.
Code loading is achieved using class loader objects, causing class loaders to be generally associated with dynamic code loading.
However, class loading plays a much more important role in the #ART.
Class loading originate from the Java ecosystem, and was ported to Android so that developers could keep writting application in Java.
Despit that, Android made a lot of change to the original Java classes, and did not document those changes.
Between static analysis general oversight of class loading, relegating it to dynamic analysis, and the lake of documentation of the actual behaviour of the #ART, the question of the impact of the class loading algorithm on static analysis can be ask.
Our secon problem statement #pb2 tries to anwser this question: #pb2-text.
#todo[title for @sec:bg-probl]
#todo[ #todo[
Problématiques du RE (reprendre l'intro avec ce qui a été dit dans 2.2) Problématiques du RE (reprendre l'intro avec ce qui a été dit dans 2.2)
apktool et androguard sont réutilisé, ca fait supposé qu'il y a peut être un peu de réutilisation apktool et androguard sont réutilisé, ca fait supposé qu'il y a peut être un peu de réutilisation

View file

@ -1,8 +1,43 @@
#import "../lib.typ": jfl-note, jm-note #import "../lib.typ": APK, etal, ART, SDK, DEX, eg, ie, pb1, pb1-text
#import "../lib.typ": todo, jm-note, jfl-note
#import "@preview/diagraph:0.3.5": raw-render
#import "../lib.typ": todo, etal, APK, eg, ie, pb1, pb1-text === Reusability of Static Analysis Tools <sec:bg-soa-rasta>
== Evaluating Static Analysis Tools <sec:bg-eval-tools> //== Android Reverse Engineering Techniques <sec:bg-techniques>
//#todo[swap with tool section ?]
//
#todo[Refactor]
==== Static Analysis <sec:bg-soa-static>
In the past fifteen years, the research community released many tools to detect or analyse malicious behaviors in applications.
Two main approaches can be distinguished: static and dynamic analysis~@Li2017.
Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system.
For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task.
Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage.
Dynamic analysis is also limited by evading techniques that may prevent the execution of malicious parts of the code.
As a consequence, a lot of efforts have been put in static approaches. //, which is the focus of this paper.
Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable tool being Flowdroid~@Arzt2014a.
#todo[Describe the different contributions in relations to the issues they tackle, be more critical]
A lot of those more advanced tools rely on common tools to interact with Android applications/#DEX bytecode@~@Li2017.
Reccuring examples of such support tools are Appktool (#eg Amandroid~@weiAmandroidPreciseGeneral2014, Blueseal~@shenInformationFlowsPermission2014, SAAF~@hoffmannSlicingDroidsProgram2013), Androguard (#eg Adagio~@gasconStructuralDetectionAndroid2013, Appareciumn~@titzeAppareciumRevealingData2015, Mallodroid~@fahlWhyEveMallory2012) or Soot (#eg Blueseal~@shenInformationFlowsPermission2014, DroidSafe~@DBLPconfndssGordonKPGNR15, Flowdroid~@Arzt2014a).
The number of publication related to static analysis make can make it difficult to find the right tool for the right task.
Li #etal~@Li2017 published a systematic literature review for Android static analysis before May 2015.
They analysed 92 publications and classified them by goal, method used to solve the problem and underlying technical solution for handling the bytecode when performing the static analysis.
In particular, they listed 27 approaches with an open-source implementation available.
Nevertheless, experiments to evaluate the reusability of the pointed out software were not performed.
#jfl-note[We believe that the effort of reviewing the literature for making a comprehensive overview of available approaches should be pushed further: an existing published approach with a software that cannot be used for technical reasons endanger both the reproducibility and reusability of research.][A mettre en avant?]
In the next section, we will look at the work that has been done to evaluate different analysis tools.
==== Evaluating Static Analysis Tools <sec:bg-eval-tools>
Works that perform benchmaks of tools follow a similar method. Works that perform benchmaks of tools follow a similar method.
They start by selecting a set of tools with similar goals. They start by selecting a set of tools with similar goals.
@ -16,7 +51,7 @@ Occasionally, the number of application a tool simply failled to analyse are als
In @sec:bg-datasets we will look at the dataset used in the community to compare analysis tools, and in @sec:rasta-soa we will go through the contributions that benchmarked those tools #jm-note[to see if they can be used as an indication as to which tools can still be used today.] [Mettre en avant] In @sec:bg-datasets we will look at the dataset used in the community to compare analysis tools, and in @sec:rasta-soa we will go through the contributions that benchmarked those tools #jm-note[to see if they can be used as an indication as to which tools can still be used today.] [Mettre en avant]
=== Application Datasets <sec:bg-datasets> ===== Application Datasets <sec:bg-datasets>
Research contributions often rely on existing datasets or provide new ones in order to evaluate the developed software. Research contributions often rely on existing datasets or provide new ones in order to evaluate the developed software.
Raw datasets such as Drebin@Arp2014 contain few information about the provided applications. Raw datasets such as Drebin@Arp2014 contain few information about the provided applications.
@ -39,7 +74,7 @@ Currently, Androzoo contains more than 25 millions applications, that can be dow
Androzoo also provide additionnal information about the applications, like the date the application was detected for the first time by Androzoo or the number of antivirus from VirusTotal that flaged the application as malicious. Androzoo also provide additionnal information about the applications, like the date the application was detected for the first time by Androzoo or the number of antivirus from VirusTotal that flaged the application as malicious.
In addition to providing researchers with an easy access to real world applications, Androzoo make it a lot easier to share datasets for reproducibility: instead of sharing hundreds of #APK files, the list of SHA256 is enough. In addition to providing researchers with an easy access to real world applications, Androzoo make it a lot easier to share datasets for reproducibility: instead of sharing hundreds of #APK files, the list of SHA256 is enough.
=== Benchmarking <sec:rasta-soa> ===== Benchmarking <sec:rasta-soa>
The few datasets composed of real-world application confirmed that some tools such as Amandroid~@weiAmandroidPreciseGeneral2014 and Flowdroid~@Arzt2014a are less efficient on real-world applications~@bosuCollusiveDataLeak2017 @luoTaintBenchAutomaticRealworld2022. The few datasets composed of real-world application confirmed that some tools such as Amandroid~@weiAmandroidPreciseGeneral2014 and Flowdroid~@Arzt2014a are less efficient on real-world applications~@bosuCollusiveDataLeak2017 @luoTaintBenchAutomaticRealworld2022.
Unfortunatly, those real-world applications datasets are rather small, and a larger number of applications would be more suitable for our goal, #ie evaluating the reusability of a variety of static analysis tools. Unfortunatly, those real-world applications datasets are rather small, and a larger number of applications would be more suitable for our goal, #ie evaluating the reusability of a variety of static analysis tools.
@ -129,3 +164,4 @@ This is problematic for a reverser engineer, not only do they need to invest a s
Hence our first problem statement #pb1: Hence our first problem statement #pb1:
#pb1-text #pb1-text

View file

@ -1,33 +0,0 @@
#import "../lib.typ": APK, etal, ART, SDK, DEX, eg,
#import "../lib.typ": todo, jm-note, jfl-note
#import "@preview/diagraph:0.3.5": raw-render
//== Android Reverse Engineering Techniques <sec:bg-techniques>
//#todo[swap with tool section ?]
== Static Analysis <sec:bg-soa-static>
In the past fifteen years, the research community released many tools to detect or analyse malicious behaviors in applications.
Two main approaches can be distinguished: static and dynamic analysis~@Li2017.
Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system.
For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task.
Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage.
Dynamic analysis is also limited by evading techniques that may prevent the execution of malicious parts of the code.
As a consequence, a lot of efforts have been put in static approaches. //, which is the focus of this paper.
Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable tool being Flowdroid~@Arzt2014a.
#todo[Describe the different contributions in relations to the issues they tackle, be more critical]
A lot of those more advanced tools rely on common tools to interact with Android applications/#DEX bytecode@~@Li2017.
Reccuring examples of such support tools are Appktool (#eg Amandroid~@weiAmandroidPreciseGeneral2014, Blueseal~@shenInformationFlowsPermission2014, SAAF~@hoffmannSlicingDroidsProgram2013), Androguard (#eg Adagio~@gasconStructuralDetectionAndroid2013, Appareciumn~@titzeAppareciumRevealingData2015, Mallodroid~@fahlWhyEveMallory2012) or Soot (#eg Blueseal~@shenInformationFlowsPermission2014, DroidSafe~@DBLPconfndssGordonKPGNR15, Flowdroid~@Arzt2014a).
The number of publication related to static analysis make can make it difficult to find the right tool for the right task.
Li #etal~@Li2017 published a systematic literature review for Android static analysis before May 2015.
They analysed 92 publications and classified them by goal, method used to solve the problem and underlying technical solution for handling the bytecode when performing the static analysis.
In particular, they listed 27 approaches with an open-source implementation available.
Nevertheless, experiments to evaluate the reusability of the pointed out software were not performed.
#jfl-note[We believe that the effort of reviewing the literature for making a comprehensive overview of available approaches should be pushed further: an existing published approach with a software that cannot be used for technical reasons endanger both the reproducibility and reusability of research.][A mettre en avant?]
In the next section, we will look at the work that has been done to evaluate different analysis tools.

View file

@ -1,6 +1,21 @@
#import "../lib.typ": DEX, pb2, pb2-text, etal #import "../lib.typ": SDK, API, API, DEX, pb2, pb2-text, etal
#import "../lib.typ": todo
== Class Loading <sec:bg-cl> == Android Class Loading <sec:bg-soa-cl>
#todo[Refactor]
=== Platform Classes <sec:bg-soa-platform>
As we said earlier, hidden #API are undocumented methods that can be used by an application, thus making them a potential blind spot when analysing an application.
However, not a lot a research has been done on the subject.
Li #etal did an empirical study of the usage and evolution of hidden #API~@li_accessing_2016.
They found that hidden #API are added and removed in every release of Android, and that they are used both by benign and malicious applications.
More recently, He #etal~@he_systematic_2023 did a systematic study of hidden service #API related to security.
They studied how the hidden #API can be used to bypass Android security restrictions and found that although Google countermeasures are effective, they need to be implemented inside the system services and not the hidden #API due to the lack of in-app privilege isolation: the framework code is in the same process as the user code, meaning any restriction in the framework can be bypassed by the user.
Unfortunately those two contributions do not explore further the consequences of the use of hidden #API for a reverse engineer.
=== Class Loading <sec:bg-cl>
Another rarely considered element of Android is its class loading mechanism. Another rarely considered element of Android is its class loading mechanism.
Class loading is a fundamental element of Java, it define which classes are loaded from where. Class loading is a fundamental element of Java, it define which classes are loaded from where.
@ -39,3 +54,4 @@ This leaves open the question of the actual default class loading behavior of An
#pb2-text #pb2-text

View file

@ -1,6 +1,10 @@
#import "../lib.typ": todo, APK, etal, ART, SDK, eg, jm-note, jfl-note #import "../lib.typ": APK, etal, ART, SDK, eg, DEX, eg, pb3, pb3-text
#import "../lib.typ": todo, jm-note, jfl-note
== Dynamic Analysis <sec:bg-dynamic> == Allowing Static Analysis Tools to Analyse Obfuscated Application <sec:bg-soa-th>
=== Dynamic Analysis <sec:bg-dynamic>
As we said previously, static analysis is not capable of analysing everything. As we said previously, static analysis is not capable of analysing everything.
Some situation, like reflection of dynamic code loading, require a different approach: dynamic analysis. Some situation, like reflection of dynamic code loading, require a different approach: dynamic analysis.
@ -44,3 +48,43 @@ In the next section, we will explore further the contributions that take this ap
//#todo[RealDroid sandbox bases on modified ART?] //#todo[RealDroid sandbox bases on modified ART?]
//#todo[force execution?] //#todo[force execution?]
=== Improving Analysis with Instrumentation <sec:bg-instrumentation>
Usually, instrumentation refers to the practice of modifying the behavior of a program to collect information during its execution.
Frida is a good example of instrumentation framework.
The term can also be used more generally to describe operation that modify the application code.
In this section, we will focus on the use of instrumentation that make an application easier to analyse by other tools, instead of just collecting additionnal information at runtime.
I the previous section, we gave the example of AppSpear~@yang_appspear_2015, that reconstruct #DEX files intercepted at runtime and repackage the #APK with the new code in it.
DexLego~@dexlego has a similar but a lot more aggressive technique.
It targets heavily obfuscated packer that decrypt then reencrypt the methods instructions just in time.
To get the bytecode, DexLego log each instruction executed by the #ART, and reconstruct the methods, then the #DEX files, from this stream of instructions.
The main limitation of this technique is that it carrys over the limitation of dynamic analysis to static analysis: the bytecode injected in the application is limited to the instructions executed during the dynamic analysis.
Nevertheless, it is an intersting way to encode the traces of a dynamic analysis in a way that can be used by any Android analysis tool.
IccTa~@liIccTADetectingInterComponent2015 technique is close to idea of modifying the application to improve its analysis: it perform a first analysis to compute the potential inter-component communication of an application, then modify the jimple representation of this application before feeding it to Flowdroid to perform a taint analysis.
Jimple is the intermediate language used by Soot, so even if IccTa does not generate a new application, this modify representation can probably be used by any tool based on the Soot framework or recompilled into a new application without too much effort.
Samhi #etal~@samhi_jucify_2022 followed this direction to unify the analysis of bytecode and native code.
Their tool, JuCify, use Angr~@angrPeople to generate the call graph of the native code, and use euristics to encode this call graph into jimple that can then be added to the jimple generated by Soot from the bytecode of the application.
Like IccTa, they use Flowdroid to analyse this new augmented representation of the application, but it should be usable by any analysis tools relying on Soot.
Finally, DroidRA~@li_droidra_2016 use the COAL~@octeauCompositeConstantPropagation2015 solver to statically compute the reflection informations.
The reflection calls are transformed into direct calls inside the application using Soot.
Using COAL makes DroidRA quite good to solve the simpler cases, where name of classes and methods targeted by reflection are already present in the application.
Those cases are quite commons and beeing able to solve those without resorting to dynamic analysis is quite useful.
On the other hand, COAL will struggle to solve cases with complexe string manipulation and is simply not able to handle cases that rely on external data (#eg downloaded from the internet at runtime).
Likewise, this can only access code loaded dynamically if the code was present inside the application without any kind of obfuscation (#eg a #DEX file in the assets of the application can be analyse, but not if it is ciphered).
#v(2em)
Instrumenting applications to encode the result of an analysis as an unified representation has been explored before.
It has been used by tools like AppSpear and DexLego to expose heavily obfuscated bytecode collected dynamically.
Similarly, DroidRA compute reflection information computed statically and inject the actual method calls inside the application it returns.
However, AppSpear and DexLego focus primarely on specific obfuscation techniques, making there implementation difficult to port to more rescent version of Android, and DroidRA suffers the limitation of static analysis.
We believe that instrumentation is a promising approach to encode those information.
Especially, we think that it could be used to provide dynamic information that are not available to static analysis tools like DroidRA.
To explore this possibility, we will try to anwser our third problem statement #pb3: #pb3-text

View file

@ -1,4 +1,6 @@
== State of the Art <sec:bg-soa> == State of the Art <sec:bg-soa>
#import("4_1_static_analysis.typ") #include("4_1_rasta.typ")
#include("4_2_classloader.typ")
#include("4_3_theseus.typ")

View file

@ -1,11 +0,0 @@
#import "../lib.typ": SDK, API, API, etal
== Platform Classes <sec:bg-soa-platform>
As we said earlier, hidden #API are undocumented methods that can be used by an application, thus making them a potential blind spot when analysing an application.
However, not a lot a research has been done on the subject.
Li #etal did an empirical study of the usage and evolution of hidden #API~@li_accessing_2016.
They found that hidden #API are added and removed in every release of Android, and that they are used both by benign and malicious applications.
More recently, He #etal~@he_systematic_2023 did a systematic study of hidden service #API related to security.
They studied how the hidden #API can be used to bypass Android security restrictions and found that although Google countermeasures are effective, they need to be implemented inside the system services and not the hidden #API due to the lack of in-app privilege isolation: the framework code is in the same process as the user code, meaning any restriction in the framework can be bypassed by the user.
Unfortunately those two contributions do not explore further the consequences of the use of hidden #API for a reverse engineer.

View file

@ -1,41 +0,0 @@
#import "../lib.typ": DEX, APK, ART, etal, eg, pb3, pb3-text, jm-note
== Improving Analysis with Instrumentation <sec:bg-instrumentation>
Usually, instrumentation refers to the practice of modifying the behavior of a program to collect information during its execution.
Frida is a good example of instrumentation framework.
The term can also be used more generally to describe operation that modify the application code.
In this section, we will focus on the use of instrumentation that make an application easier to analyse by other tools, instead of just collecting additionnal information at runtime.
I the previous section, we gave the example of AppSpear~@yang_appspear_2015, that reconstruct #DEX files intercepted at runtime and repackage the #APK with the new code in it.
DexLego~@dexlego has a similar but a lot more aggressive technique.
It targets heavily obfuscated packer that decrypt then reencrypt the methods instructions just in time.
To get the bytecode, DexLego log each instruction executed by the #ART, and reconstruct the methods, then the #DEX files, from this stream of instructions.
The main limitation of this technique is that it carrys over the limitation of dynamic analysis to static analysis: the bytecode injected in the application is limited to the instructions executed during the dynamic analysis.
Nevertheless, it is an intersting way to encode the traces of a dynamic analysis in a way that can be used by any Android analysis tool.
IccTa~@liIccTADetectingInterComponent2015 technique is close to idea of modifying the application to improve its analysis: it perform a first analysis to compute the potential inter-component communication of an application, then modify the jimple representation of this application before feeding it to Flowdroid to perform a taint analysis.
Jimple is the intermediate language used by Soot, so even if IccTa does not generate a new application, this modify representation can probably be used by any tool based on the Soot framework or recompilled into a new application without too much effort.
Samhi #etal~@samhi_jucify_2022 followed this direction to unify the analysis of bytecode and native code.
Their tool, JuCify, use Angr~@angrPeople to generate the call graph of the native code, and use euristics to encode this call graph into jimple that can then be added to the jimple generated by Soot from the bytecode of the application.
Like IccTa, they use Flowdroid to analyse this new augmented representation of the application, but it should be usable by any analysis tools relying on Soot.
Finally, DroidRA~@li_droidra_2016 use the COAL~@octeauCompositeConstantPropagation2015 solver to statically compute the reflection informations.
The reflection calls are transformed into direct calls inside the application using Soot.
Using COAL makes DroidRA quite good to solve the simpler cases, where name of classes and methods targeted by reflection are already present in the application.
Those cases are quite commons and beeing able to solve those without resorting to dynamic analysis is quite useful.
On the other hand, COAL will struggle to solve cases with complexe string manipulation and is simply not able to handle cases that rely on external data (#eg downloaded from the internet at runtime).
Likewise, this can only access code loaded dynamically if the code was present inside the application without any kind of obfuscation (#eg a #DEX file in the assets of the application can be analyse, but not if it is ciphered).
#v(2em)
Instrumenting applications to encode the result of an analysis as an unified representation has been explored before.
It has been used by tools like AppSpear and DexLego to expose heavily obfuscated bytecode collected dynamically.
Similarly, DroidRA compute reflection information computed statically and inject the actual method calls inside the application it returns.
However, AppSpear and DexLego focus primarely on specific obfuscation techniques, making there implementation difficult to port to more rescent version of Android, and DroidRA suffers the limitation of static analysis.
We believe that instrumentation is a promising approach to encode those information.
Especially, we think that it could be used to provide dynamic information that are not available to static analysis tools like DroidRA.
To explore this possibility, we will try to anwser our third problem statement #pb3: #pb3-text

View file

@ -8,10 +8,4 @@
#include("2_android_bg.typ") #include("2_android_bg.typ")
#include("3_problem_statements.typ") #include("3_problem_statements.typ")
#include("4_soa.typ") #include("4_soa.typ")
#include("5_conclusion.typ")
#include("4_datasets_and_benchmarking.typ")
#include("5_platform_classes.typ")
#include("6_classloading.typ")
#include("7_dynamic_analysis.typ")
#include("8_instrumentation.typ")
#include("9_conclusion.typ")

View file

@ -5,7 +5,7 @@
To perform the transformations described in @sec:th-trans, we need information like the name and signature of the method called with reflection, or the actual bytecode loaded dynamically. To perform the transformations described in @sec:th-trans, we need information like the name and signature of the method called with reflection, or the actual bytecode loaded dynamically.
We decided to collect that information through dynamic analysis. We decided to collect that information through dynamic analysis.
We saw in @sec:bg different contributions that collect this kind of information. We saw in @sec:bg different contributions that collect this kind of information.
In the end, we decided to keep the analysis as simple as possible, so we avoided using a custom Android build like DexHunter, and instead used Frida (see @sec:bg-frida) to instrument the application and intercept calls of the methods of interest. In the end, we decided to keep the analysis as simple as possible, so we avoided using a custom Android build like DexHunter, and instead used Frida to instrument the application and intercept calls of the methods of interest.
@sec:th-fr-dcl present our approach to collect dynamically loaded bytecode, and @sec:th-fr-ref present our approach to collect the reflection data. @sec:th-fr-dcl present our approach to collect dynamically loaded bytecode, and @sec:th-fr-ref present our approach to collect the reflection data.
Because using dynamic analysis raises the concern of coverage, we also need some interaction with the application during the analysis. Because using dynamic analysis raises the concern of coverage, we also need some interaction with the application during the analysis.
Ideally, a reverse engineer would do the interaction. Ideally, a reverse engineer would do the interaction.