From d9650d07758ff433d569202d0d2c802262b0c1c6 Mon Sep 17 00:00:00 2001 From: Jean-Marie 'Histausse' Mineau Date: Tue, 23 Sep 2025 17:07:10 +0200 Subject: [PATCH 1/2] conclusion easy part done --- 6_conclusion/1_contributions.typ | 33 +++++++++++++++++++------------- 6_conclusion/2_futur.typ | 1 + 2 files changed, 21 insertions(+), 13 deletions(-) diff --git a/6_conclusion/1_contributions.typ b/6_conclusion/1_contributions.typ index 499c434..2d4b37b 100644 --- a/6_conclusion/1_contributions.typ +++ b/6_conclusion/1_contributions.typ @@ -7,26 +7,33 @@ In this thesis, we presented the following contributions. -First, we explored the reusabiliy of static analysis tools. +First, we explored the reusability of static analysis tools. Based on a systematic literature review by Li #etal, we identified 22 tools of interest, published between 2012 and 2017. To estimate the current usability of those tools, we tested their most recent version on a large dataset of #rasta.NBTOTALSTRING applications. -We then counted the number of analysis the finished and return a result. +We then counted the number of analyses that finished and returned a result. We established that #rasta.resultunusable of #rasta.nbtoolsselectedvariations tools are not reusable. -We were not able to use two of them, even with the help of the authors, while 10 others failed to finish their analysis more than half of the times. -The study of the succes rate of the tools for applications grouped by their caracteristics showed that the greater bytecode size increase the chance of analysis failure. -The same goes for min #SDK version to a lesser extent, and it appears that analyses of malwares are less likely to encounter a fatal error than analyses of goodware. -In the process of testing the tools, we built docker images of working setup for the tools. -We released those images in the hope to help future researcher that would want to use those tools. +We were not able to use two of them, even with the help of the authors, while 10 others failed to finish their analysis more than half the time. +The study of the finishing rate of the tools for applications grouped by their characteristics showed that the greater bytecode size increases the chance of analysis failure. +The same goes for min #SDK version to a lesser extent, and it appears that analyses of malware are less likely to encounter a fatal error than analyses of goodware. +During the testing process, we built Docker images of working setups for the tools. +We released those images in the hope of helping future researchers who would want to use those tools. -Our second contributions models the default class loading behaviour of Android and introduced a class of obfuscation based on it: shadow attacks. -We showed that, by including multiple classes with the same name in an application, or including classes with the same name as classes in the android #SDK, and application can mislead a reverse engineer or impact the results of analysis tools. -We scanned a dataset of rescent applications and found that although those situations appear in wild, shadow attacks do no seam to be actually used. -Instead, we believe that classes from the #SDK are added either for retro-compatibility or due to the developper being unaware that a library was already present in the Android #SDK, and the few cases were classes are present multiple times in the application look like mistakes during the compilation of the application. -Still, #cl.shadowsdk of the applications we tested were shadowing classes from targeted #SDK version. +Our second contribution models the default class loading behaviour of Android and introduces a class of obfuscation based on it: shadow attacks. +We showed that, by including multiple classes with the same name in an application, or including classes with the same name as classes in the Android #SDK, an application can mislead a reverse engineer or impact the results of analysis tools. +We scanned a dataset of recent applications and found that although those situations appear in the wild, shadow attacks do not seem to be actually used. +Instead, we believe that classes from the #SDK are added either for retro-compatibility or due to the developer being unaware that a library was already present in the Android #SDK, and the few cases where classes are present multiple times in the application appear to be mistakes during the compilation of the application. +Still, #cl.shadowsdk of the applications we tested were shadowing classes from the targeted #SDK version. +Lastly, we proposed a solution to reuse any static analysis tool on an application that uses dynamic code loading or reflection. +To do so, we collect the relevant information dynamically, then instrument the application to encode the dynamic information inside a valid application mimicking the dynamic behaviour of the original one. +This new application can then be analysed normally by any tool that accepts an application as input. +We tested our method on a subset of recent applications from the dataset of our first contribution. +The results of our dynamic analysis suggest that we failed to correctly explore many applications, hinting at weaknesses in our experimental setup. +Nonetheless, we did obtain some dynamic data, allowing us to pursue our experiment. +We compared the finishing rate of tools on the original application and the instrumented application using the same experiment as in our first contribution, and found that, in general, the instrumentation only slightly reduces the finishing rate of analysis tools. +We also confirmed that the instrumentation does improve the result of analysis tools, allowing them to compute more comprehensive call graphs of the applications, or to detect new data flows. /* -* Futur work: mon unique pov pour le futur: what need to be done * * Take aways depuis l'intro * puis résumé des contributions majeurs, un paragraphe par contrib diff --git a/6_conclusion/2_futur.typ b/6_conclusion/2_futur.typ index ae1d63d..b109c4a 100644 --- a/6_conclusion/2_futur.typ +++ b/6_conclusion/2_futur.typ @@ -13,4 +13,5 @@ Robust default, close to Android: the java zip parser is often targeted, there is something to be done here ] +// Futur work: mon unique pov pour le futur: what need to be done // future work plus haut niveau: reprandre les plus important et/ou des plus large: eg: quide web-base? flutter? wasm ? From d1dba304265c4d54d1fe99bd9ba7ce5c5ee3f1cd Mon Sep 17 00:00:00 2001 From: Jean-Marie 'Histausse' Mineau Date: Wed, 24 Sep 2025 00:44:19 +0200 Subject: [PATCH 2/2] rerefactor bg --- 2_background/{0_intro.typ => 1_intro.typ} | 8 +-- .../{1_android.typ => 2_1_android.typ} | 72 +++++++++++++------ 2_background/{2_tools.typ => 2_2_tools.typ} | 50 +++++++------ ...c_analysis.typ => 2_3_static_analysis.typ} | 45 ++++-------- 2_background/2_android_bg.typ | 9 +++ 2_background/3_problem_statements.typ | 11 +++ 2_background/4_1_static_analysis.typ | 33 +++++++++ 2_background/4_soa.typ | 4 ++ 2_background/5_platform_classes.typ | 14 +--- 2_background/9_conclusion.typ | 2 +- 2_background/main.typ | 9 +-- 11 files changed, 159 insertions(+), 98 deletions(-) rename 2_background/{0_intro.typ => 1_intro.typ} (89%) rename 2_background/{1_android.typ => 2_1_android.typ} (74%) rename 2_background/{2_tools.typ => 2_2_tools.typ} (69%) rename 2_background/{3_static_analysis.typ => 2_3_static_analysis.typ} (62%) create mode 100644 2_background/2_android_bg.typ create mode 100644 2_background/3_problem_statements.typ create mode 100644 2_background/4_1_static_analysis.typ create mode 100644 2_background/4_soa.typ diff --git a/2_background/0_intro.typ b/2_background/1_intro.typ similarity index 89% rename from 2_background/0_intro.typ rename to 2_background/1_intro.typ index 817475d..9b8e4bb 100644 --- a/2_background/0_intro.typ +++ b/2_background/1_intro.typ @@ -25,12 +25,10 @@ Regrettably, analysis tools mostly return results in an ad hoc format, making it Some tools however encode their result in the form of a new augmented Android application. The idea beeing that any Android analysis tools must be able to handle an Android application in the first place, so it will have access to those new information. -In this section, explore in more details those different aspects of Android reverse engineering. +We will begin this chapter by a presentation of the bases of the Android ecosystem. +The reader already familliar with Android reverse engineering might want to skip to @sec:bg-probl where we put our problem statements in perspective. +We will then examine the state of the art related to those problem statements @sec:bg-soa, and conclude this chapter in @sec:bg-conclusion. -#todo[Plan d'annonce] -#todo[Petit intro back platform classes, séparé de soa] -#todo[Petit intro class loading séparé de soa] -#todo[Bien séparer background et st-o-a] #todo[bien dédier des sections/sous section aux 3 problemes] #todo[synthese a la fin de chaque section soa des problemes] #todo[Problematique avant soa] diff --git a/2_background/1_android.typ b/2_background/2_1_android.typ similarity index 74% rename from 2_background/1_android.typ rename to 2_background/2_1_android.typ index e596f04..1130392 100644 --- a/2_background/1_android.typ +++ b/2_background/2_1_android.typ @@ -1,14 +1,13 @@ #import "../lib.typ": todo, num, APK, JAR, AXML, ART, SDK, JNI, NDK, DEX, XML, API, ZIP, jfl-note -== Android +=== Android Android is the smartphone operating system developed by Google. It is based on a Long Term Support Linux Kernel, to which are added patches develloped by the Android community. -On top of the kernel, Android redeveloped many of the usual components used by linux-based operating systems, and added new ones. +On top of the kernel, Android redeveloped many of the usual components used by linux-based operating systems, like the init system or the standart C library, and added new ones, like the #ART that execute the applications. Those change make Android a verry unique operating system. -#jfl-note[][Chiffres pour illustrer?] -=== Android Applications +==== Android Applications Application in the Android ecosystem are distributed in the #APK format. #APK files are #JAR files with additionnal features, which are themself #ZIP files with additionnal features. @@ -21,10 +20,10 @@ When ressources are present in `res/`, the file `resources.arsc` is also present The `assets/` folder contains the files that are used directly by the code application. Depending on the application and compilation process, any kind of other files and folders can be added to the application. -==== Signature +===== Signature Android applications are cryptographically signed to prove the autorship. -Applicatations signed with same key are considered developed by the same entity. +Applicatations signed with the same key are considered developed by the same entity. This allow to securely update applications, and applications can declare security permission to restrict access to some feature to only application with the same author. Android has several signature schemes coexisting: @@ -36,7 +35,7 @@ Android has several signature schemes coexisting: - The v4 signature scheme is complementary to the v2/v3 signature scheme. Signature data are stored in an external, `.apk.idsig` file. -==== Android Manifest +===== Android Manifest The Android Manifest is stored in the `AndroidManifest.xml`, encoded in the binary #AXML format. The manifest declare important informations about the application: @@ -46,7 +45,7 @@ The manifest declare important informations about the application: - Intent filters to list the intents that can start or be sent to the application componants. - Security permissions required by the application. -==== Code +===== Code An application usually contains at least a `classes.dex` file containing Dalvik bytecode. This is the format executed by the Android #ART. @@ -60,14 +59,14 @@ Because native code is compiled for a specific architecture, `.so` files are pre For example `lib/arm64-v8a/libexample.so` is the version of the `example` library compiled for an ARM 64 architecture. Because smartphones mostly use ARM processors, it is not rare to see applications that only have the ARM version of their native code. -==== Ressources +===== Ressources -Application user interface require many kind of specific assets, which are stored in `lib/`. +Developing graphical interfaces for applications require many kind of specific assets, which are stored in `lib/`. Those ressources include bitmap images, text, layout, etc. Data like layout, color or text are stored in binary #AXML. An additionnal file, `resources.arsc`, in a custom binary format, contains a list of the ressources names, ids, and their properties. -==== Compilation Process +===== Compilation Process For the developer, the compilation process is handled by Android Studio and is mostly transparent. Behind the scene, Android Studio rely on Gradle to orchestrate the different compilation steps: @@ -97,16 +96,14 @@ Since 2021, Google requires that new applications in the Google Play app store t The main difference is that Google will perform the last packaging steps and generate (and sign) the application itself. This allow Google to generate different applications for different target, and avoid including unnecessary files in the application like native code targetting the wrong architecture. -=== Android Runtime +==== Android Runtime Android runtime environement has many specificities that sets it appart from other platforms. An heavy emphasis is put on isolating the applications from one another as well from the systems critical capabilities. The code execution itself can be confusing at first. Instead of the usual linear model with a single entry point, applications have many entrypoints that are called by the Android framework in accordance to external events. -==== Application Architecture - -#todo[Subsection name?] +===== Application Architecture Android application expose their componants to the Android Runtime (#ART) via classes inheriting specific classes from the Android #SDK. Four classes represent application components that can be used as entry points: @@ -125,19 +122,50 @@ In addition to the componants declared in the manifest that act as entry points, The most obvious cases are for the user interface, for example a button will call a callback method defined by the application when clicked. Other part of the #API also rely on non-linear execution, for example when an application sends an intent (see @sec:bg-sandbox), the intent sent in responce is transmitted back to the application by calling another method. -==== Application Isolation and Interprocess Communication +===== Application Isolation and Interprocess Communication -On Android, each application has its own storage folders and the application process are isolated from other applications and the hardware interfaces. +On Android, each application has its own storage folders and the application processes are isolated from each other and from the hardware interfaces. This sandboxing is done using Linux security features like group and user permissions, SELinux, and seccomp. The sandboxing is adjusted according to the permissions requested in the `AndroidManifest.xml` file of the applications. In addition, most feature of the Android system can only be accessed through Binder, Android main interprocess communication channel. Binder is a componant of tha Android framework, external to the application, that all applications can communicate with. -Applicatians can send messages to Binder, called *intent*, that will check if the application is allowed to send it then foward it to the appropriate componant that can then responce with another intent. -Applications can also receive intent must declare intent filters to indicate which intent can be send to the application, and which classes receive the intents. -Intent are central to Android applications and are not used just to access Android capabilities. -For instance, the activities and services are started by receiving intent, and it is not uncommon for application to send intents to itself to switch activities. +Applications can send messages to Binder, called *intents*. +Binder will check if the application is allowed to send it, and then foward it to the appropriate componant. +This component can then respond with another intent. +Applications must declare intent filters to indicate which intent can be send to the application, and which classes receive the intents. +Intents are central to Android applications and are not just used to access Android capabilities. +For instance, the activities and services are started by receiving and intent, and it is not uncommon for application to self-send intents to switch between activities. Intent can also be sent directly from Android to the application: when a user starts an application by tapping the app icons, Android will send an intent to the class of the application that defined the intent filter for the `android.intent.action.MAIN` intent. One interesting feature of the Binder is that intent do not need to explicitly name the targetted application and class: intent can be implicit and request an action without knowing the exact application that will performed it. -An example of this behavior is when an application whant open a file: an `android.intent.action.VIEW` intent is sent with the file location, and Binder will find an application capable of viewing the file. +An example of this behaviour is when an application want to open a file: an `android.intent.action.VIEW` intent is sent with the file location and type, and Binder will find and start an application capable of viewing this file. +===== Platform Classes + +In addition to the classes they include, Android applications have access to classes provided by Android, stored on the phone. +Those classes are called _platform classes_. +They are devided between #SDK classes, and hidden #API. +The #SDK classes can be seen as the Android standard library. +They are documented by Google, and have a certain stability from version to version. +In case of breaking changes, the changed are listed by Google as well. +The list of #SDK classes is available at compile time in the form of a `android.jar` file to link against. + +On the opposite, hidden #API are undocumented methods used internally by the #ART. +Still, they are loaded by the application and can be used by it. + +===== Class Loading and Reflection + +Class loading is the mechanism used by Android to find and select the classes implementation when encontering a reference to a class. +Android developers mainly use it to load bytecode dynamically from a source other than the application itself (#eg a file downloaded at runtime), using `ClassLoader` objects. +`Class` objects are the retrieved from those class loaders using their name in the form of strings to identify them. +Those `Class` can then be instanciated into object, and `Methods` objects can be used to call the mehtods of the instanciated object. +The process of manipulating `Class` and `Methods` object instead of using bytecode instructions is called reflection. +Reflection is not limited to bytecode that has been dynamically loaded: it can be used for any class or method available to the application. + +Because the `ClassLoader` object are only used when loading bytecode dynamically or when using reflection, it is often forgotten that the #ART uses class loaders constantly behind the scene, allowing classes from the application and platform classes to cohabit seamlessly. + + +#v(2em) + +In this subsection, we presented the most notable specificities of the Android ecosystem. +In the next section, we will continue with the various tools available for an Android reverse engineer. diff --git a/2_background/2_tools.typ b/2_background/2_2_tools.typ similarity index 69% rename from 2_background/2_tools.typ rename to 2_background/2_2_tools.typ index 8f67060..4126fd4 100644 --- a/2_background/2_tools.typ +++ b/2_background/2_2_tools.typ @@ -1,13 +1,19 @@ #import "../lib.typ": todo, APK, IDE, SDK, DEX, ADB, ART, eg, XML, AXML, API, jfl-note -== Reverse Engineering Tools +=== Reverse Engineering Tools Due to the specificities of Android, reverse engineers need tools adapted to Android. -The developement tools provided by Google can be used for basic operations. -Apktool and Jadx are common tools used to read the content of an application, meanwhile Androguard and Soot can be used as librairy to automate analysis. -For a more dynamic approach, Frida is a toolkit that can be use to intercept method call and execute custom while an application is running. +The developement tools provided by Google can be used for basic operations, but a reverse engineer will quickly need more specialized tool. +Usually, the first steep while while analysing an application is to look at its content. +Apktool and Jadx are common tools used to convert the content of an application into a readable format. +Analysing an application this way, without running it, is called static analysis. +For more advanced form of static analysis, Androguard and Soot can be used as librairy to automate analyses. +When static analysis became too complicated (#eg if the application uses obfuscation techniques), a reverse engineer might switch to dynamic analysis. +This time, the application is executed and the analyst will scrutinise the behaviour of the application. +Frida is a good option to help this dynamic analysis, +It is a toolkit that can be use to intercept method call and execute custom while an application is running. -=== Android Studio +==== Android Studio The whole Android developement ecosystem is packaged by Google in the #IDE Android Studio#footnote[https://developer.android.com/studio]. In practice, Android Studio is a source-code editor that wrap arround the different tools of the android #SDK. @@ -31,44 +37,44 @@ Among the notable tools in the #SDK, they are: It can also be used to perform different level of optimization of the bytecode generated. - `aapt`/`aapt2` (Android Asset Packaging Tool): This tools is used to build the #APK file. It is commonly used by other tools that repackage applications like Apktool. - Behind the scene, it we convert #XML to binary #AXML and ensure the right files have the right compression and alignment. (#eg some ressource files are mapped in memory by the #ART, and thus need to be aligned and not compressed). + Behind the scene, it converts #XML to binary #AXML and ensure that each files have the right compression and alignment. (#eg some ressource files are mapped in memory by the #ART, and thus need to be aligned and not compressed). - `apksigner`: the tool used to sign an #APK file. When repackaging an application, for example with Apktool, the new application need to be signed. -=== Apktool +==== Apktool Apktool#footnote[https://apktool.org/] is a _reengineering tool_ for Android #APK files. It can be used to disassemble an application: it will extract the files from the #APK file, convert the binary #AXML to text #XML, and use smali/backsmali#footnote[https://github.com/JesusFreke/smali] to convert the #DEX files to smali, an assembler-like langage that match the Dalvik bytecode instructions. The main strenght of Apktool is that after having disassemble an application, the content of the application can be edited and reassemble into a new #APK. #jfl-note[limites? ca marche toujours?] -=== Androguard +==== Androguard -Androguard#footnote[https://github.com/androguard/androguard]~@desnos:adnroguard:2011 is a python library for parsing and analysing #APK files. -#jfl-note[Its main feature is disassembling #APK files.][backend #sym.eq.not apktool?] +Androguard#footnote[https://github.com/androguard/androguard]~@desnos:adnroguard:2011 is a python library for parsing and disassembling #APK files. It can be used to automatically read Android manifests, ressources, and bytecode. -Contrary to Apktool, it can be used programatically, whithout parsing text files, to analyse the application, but it cannot repackage a modified application. +Contrary to Apktool wich generate text files, it can be used as a library to programatically to analyse the application. +However, contrary to Apktool, it cannot repackage a modified application. +In addition, it can perform additionnal analysis, like computing a call graph or control flow graph of the application. +We will explain what are those graphs later in @sec:bg-static. -In addition, it can perform additionnal analysis, like computing a call graph or control flow graph. - -=== Jadx +==== Jadx Jadx#footnote[https://github.com/skylot/jadx] is an application decompiler. It convert #DEX files to Java source code. It is not always capable of decompiling all classes of an application, so it cannot be used to recompile a new application, but the code generated can be very helpful to reverse an application. In addition to decompilling #DEX files, Jadx can also decode Android manifests and application ressources. -=== Soot +==== Soot -Soot#footnote[https://github.com/soot-oss/soot]~@Arzt2013 is a Java optimization framework. -It can leaft java bytecode to other intermediate representations that can be used to perform optimization then converted back to bytecode. -Because Dalvik bytecode and Java bytecode are equivalent, support for Android was added to Soot, and Soot features are now leveraged to analyse Android applications. -One of the best known example of Soot usage for Android analysis is Flowdroid~@Arzt2014a, a tool that compute data flow in an application. +Soot#footnote[https://github.com/soot-oss/soot]~@Arzt2013 was originaly a Java optimization framework. +It could leaft java bytecode to other intermediate representations that can could be optimized, then converted back to bytecode. +Because Dalvik bytecode and Java bytecode are equivalent, support for Android was added to Soot, and Soot features are now leveraged to analyse and modify Android applications. +One of the best known example of Soot usage for Android analysis is Flowdroid~@Arzt2014a, a tool that computes data flow in an application. A new version of Soot, SootUp#footnote[https://github.com/soot-oss/SootUp], is currently beeing worked on. Compared to Soot, it has a modernize interface and architecture, but it is not yet feature complete and some tools like Flowdroid are still using Soot. -=== Frida +==== Frida Frida#footnote[https://frida.re/] is a dynamic intrumentation toolkit. It allows the reverse engineer to inject and run javascript code inside a running application. @@ -85,5 +91,5 @@ Malware might implement countermeasures that avoid running malicious payload in Those tools are quite useful for manual operations. However, considering the complexity of modern Android applications, it might take a lot of work for a reverse engineer to analyse one application. -In the next section, we will see more advance techniques that have been developped to analyse Android applications. - +Different techniques have been developped to streamline the analysis. +Next, we will see the most common of those techniques for static analysis. diff --git a/2_background/3_static_analysis.typ b/2_background/2_3_static_analysis.typ similarity index 62% rename from 2_background/3_static_analysis.typ rename to 2_background/2_3_static_analysis.typ index 8e4ee3c..0956cd6 100644 --- a/2_background/3_static_analysis.typ +++ b/2_background/2_3_static_analysis.typ @@ -2,27 +2,18 @@ #import "../lib.typ": todo, jm-note, jfl-note #import "@preview/diagraph:0.3.5": raw-render -//== Android Reverse Engineering Techniques - -//#todo[swap with tool section ?] - - -== Static Analysis - -In the past fifteen years, the research community released many tools to detect or analyse malicious behaviors in applications. -Two main approaches can be distinguished: static and dynamic analysis~@Li2017. -Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system. -For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task. -Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage. -Dynamic analysis is also limited by evading techniques that may prevent the execution of malicious parts of the code. -As a consequence, a lot of efforts have been put in static approaches. //, which is the focus of this paper. +=== Static Analysis Static analysis program examine an #APK file without executing it to extract information from it. -Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code. +Basic static analysis can include extracting information from the `AndroidManifest.xml` file or decompiling bytecode to Java code with tools like Apktool or Jadx. +Unfortunately, simply reading the bytecode does not scale. +To do so, a human analyst is needed, making it complicated to analyse a large number of applications, and even for single applications, the size and complexity of some applications can quickly overwhelm the reverse engineer. -More advance analysis consist in the computing the control-flow of an application and computing its data-flow~@Li2017. - -The most basic form of control-flow analysis is to build a call graph. +Control flow analysis is often used to mitigate this issue. +The idea is to extract the behaviour, the flow, of the application from the bytecode, and to represent it as a graph. +A graph representation is easier to work with than a list of instructions, and can be used for further analysis. +Depending on the level of precision required, different types of graphs can be computed. +The most basic of those graph is the call graph. A call graph is a graph where the nodes represent the methods in the application, and the edges reprensent calls from one method to another. @fig:bg-fizzbuzz-cg-cfg b) show the call graph of the code in @fig:bg-fizzbuzz-cg-cfg a). A more advance control-flow analysis consist in building the control-flow graph. @@ -118,20 +109,19 @@ This time, instead of methods, the nodes represent instructions, and the edges i supplement: [Figure], caption: [Source code for a simple Java method and its Call and Control Flow Graphs], ) + Once the control-flow graph is computed, it can be used to compute data-flows. Data-flow analysis, also called taint-tracking, allows to follow the flow of information in the application. Be defining a list of methods and fields that can generate critical information (taint sources) and a list of methods that can consume information (taint sink), taint-tracking allows to detect potential data leaks (if a data flow link a taint source and a taint sink). For example, `TelephonyManager.getImei()` returns an unique, persistent, device identifier. -This can be used to identify the user, and it cannot be changed if #jfl-note[compromised][replace by: this imei is dislaxd (illisible) \ jm: ???]. +This can be used to identify the user, and it cannot be changed if compromised. This make `TelephonyManager.getImei()` a good candidate as a taint source. On the other hand, `UrlRequest.start()` send a request to an external server, making it a taint sink. If a data-flow is found linking `TelephonyManager.getImei()` to `UrlRequest.start()`, this means the application is potentially leaking a critical information to an external entity, a behavior that is probably not wanted by the user. -Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable tool being Flowdroid~@Arzt2014a. -#todo[Describe the different contributions in relations to the issues they tackle, be more critical] Static analysis is powerful as it allows to detects unwanted behavior in an application even is the behavior does not manifest itself when running the application. -Hovewer, static analysis tools must overcom many challenges when analysing Android applications: +Hovewer, static analysis tools must overcom many challenges when analysing Android applications. / the Java object-oriented paradigm: A call to a method can in fact correspond to a call to any method overriding the original method in subclasses. / the multiplicity of entry points: Each component of an application can be an entry point for the application. / the event driven architecture: Methods of in the applications can be called when event occur, in unknown order. @@ -142,13 +132,4 @@ Hovewer, static analysis tools must overcom many challenges when analysing Andro For instance, the multi-dex feature presented in @sec:bg-android-code-format was introduced in Android #SDK 21. Tools unaware of this feature only analyse the `classes.dex` file an will ignore all other `classes.dex` files. -A lot of those more advanced tools rely on common tools to interact with Android applications/#DEX bytecode@~@Li2017. -Reccuring examples of such support tools are Appktool (#eg Amandroid~@weiAmandroidPreciseGeneral2014, Blueseal~@shenInformationFlowsPermission2014, SAAF~@hoffmannSlicingDroidsProgram2013), Androguard (#eg Adagio~@gasconStructuralDetectionAndroid2013, Appareciumn~@titzeAppareciumRevealingData2015, Mallodroid~@fahlWhyEveMallory2012) or Soot (#eg Blueseal~@shenInformationFlowsPermission2014, DroidSafe~@DBLPconfndssGordonKPGNR15, Flowdroid~@Arzt2014a). - -The number of publication related to static analysis make can make it difficult to find the right tool for the right task. -Li #etal~@Li2017 published a systematic literature review for Android static analysis before May 2015. -They analysed 92 publications and classified them by goal, method used to solve the problem and underlying technical solution for handling the bytecode when performing the static analysis. -In particular, they listed 27 approaches with an open-source implementation available. -Nevertheless, experiments to evaluate the reusability of the pointed out software were not performed. -#jfl-note[We believe that the effort of reviewing the literature for making a comprehensive overview of available approaches should be pushed further: an existing published approach with a software that cannot be used for technical reasons endanger both the reproducibility and reusability of research.][A mettre en avant?] -In the next section, we will look at the work that has been done to evaluate different analysis tools. +#todo[Ca serait bien de souligner Dyn Code Load et Reflection] diff --git a/2_background/2_android_bg.typ b/2_background/2_android_bg.typ new file mode 100644 index 0000000..63ba921 --- /dev/null +++ b/2_background/2_android_bg.typ @@ -0,0 +1,9 @@ +#import "../lib.typ": todo + +== Android Background + +#todo[Intro] + +#import("2_1_android.typ") +#import("2_2_tools.typ") +#import("2_3_static_analysis.typ") diff --git a/2_background/3_problem_statements.typ b/2_background/3_problem_statements.typ new file mode 100644 index 0000000..61092ad --- /dev/null +++ b/2_background/3_problem_statements.typ @@ -0,0 +1,11 @@ +#import "../lib.typ": todo + +== PB + +#todo[title for @sec:bg-probl] +#todo[ + Problématiques du RE (reprendre l'intro avec ce qui a été dit dans 2.2) + apktool et androguard sont réutilisé, ca fait supposé qu'il y a peut être un peu de réutilisation + on peut charger des classes, et dans le code d'android, on vois qu'en fait le classes loading est beaucoup plus important que ca + c'est connus que cl + statique + ref = nono, tout les outils présentes leurs solutions d'une certaine facons +] diff --git a/2_background/4_1_static_analysis.typ b/2_background/4_1_static_analysis.typ new file mode 100644 index 0000000..0814d11 --- /dev/null +++ b/2_background/4_1_static_analysis.typ @@ -0,0 +1,33 @@ +#import "../lib.typ": APK, etal, ART, SDK, DEX, eg, +#import "../lib.typ": todo, jm-note, jfl-note +#import "@preview/diagraph:0.3.5": raw-render + +//== Android Reverse Engineering Techniques + +//#todo[swap with tool section ?] + + +== Static Analysis + +In the past fifteen years, the research community released many tools to detect or analyse malicious behaviors in applications. +Two main approaches can be distinguished: static and dynamic analysis~@Li2017. +Dynamic analysis requires to run the application in a controlled environment to observe runtime values and/or interactions with the operating system. +For example, an Android emulator with a patched kernel can capture these interactions but the modifications to apply are not a trivial task. +Such approach is limited by the required time to execute a limited part of the application with no guarantee on the obtained code coverage. +Dynamic analysis is also limited by evading techniques that may prevent the execution of malicious parts of the code. +As a consequence, a lot of efforts have been put in static approaches. //, which is the focus of this paper. + +Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable tool being Flowdroid~@Arzt2014a. + +#todo[Describe the different contributions in relations to the issues they tackle, be more critical] + +A lot of those more advanced tools rely on common tools to interact with Android applications/#DEX bytecode@~@Li2017. +Reccuring examples of such support tools are Appktool (#eg Amandroid~@weiAmandroidPreciseGeneral2014, Blueseal~@shenInformationFlowsPermission2014, SAAF~@hoffmannSlicingDroidsProgram2013), Androguard (#eg Adagio~@gasconStructuralDetectionAndroid2013, Appareciumn~@titzeAppareciumRevealingData2015, Mallodroid~@fahlWhyEveMallory2012) or Soot (#eg Blueseal~@shenInformationFlowsPermission2014, DroidSafe~@DBLPconfndssGordonKPGNR15, Flowdroid~@Arzt2014a). + +The number of publication related to static analysis make can make it difficult to find the right tool for the right task. +Li #etal~@Li2017 published a systematic literature review for Android static analysis before May 2015. +They analysed 92 publications and classified them by goal, method used to solve the problem and underlying technical solution for handling the bytecode when performing the static analysis. +In particular, they listed 27 approaches with an open-source implementation available. +Nevertheless, experiments to evaluate the reusability of the pointed out software were not performed. +#jfl-note[We believe that the effort of reviewing the literature for making a comprehensive overview of available approaches should be pushed further: an existing published approach with a software that cannot be used for technical reasons endanger both the reproducibility and reusability of research.][A mettre en avant?] +In the next section, we will look at the work that has been done to evaluate different analysis tools. diff --git a/2_background/4_soa.typ b/2_background/4_soa.typ new file mode 100644 index 0000000..53a1d8f --- /dev/null +++ b/2_background/4_soa.typ @@ -0,0 +1,4 @@ + +== State of the Art + +#import("4_1_static_analysis.typ") diff --git a/2_background/5_platform_classes.typ b/2_background/5_platform_classes.typ index c8284f2..95e0f9c 100644 --- a/2_background/5_platform_classes.typ +++ b/2_background/5_platform_classes.typ @@ -1,18 +1,8 @@ #import "../lib.typ": SDK, API, API, etal -== Platform Classes +== Platform Classes -In addition to the classes they include, Android applications have access to classes provided by Android. -Those classes are called _platform classes_. -They are devided between #SDK classes, and hidden #API. -The #SDK classes can be seen as the Android standard library. -They are documented by Google, and have a certain stability from version to version. -In case of breaking changes, the changed are listed by Google as well. -The list of #SDK classes is available at complite time in the form of a `android.jar` file to link against. - -On the opposite, hidden #API are undocumented methods used internally by Android. -Still, they are loaded by the application and can be used by it. -Thus, they are a potential blind spot when analysing an application. +As we said earlier, hidden #API are undocumented methods that can be used by an application, thus making them a potential blind spot when analysing an application. However, not a lot a research has been done on the subject. Li #etal did an empirical study of the usage and evolution of hidden #API~@li_accessing_2016. They found that hidden #API are added and removed in every release of Android, and that they are used both by benign and malicious applications. diff --git a/2_background/9_conclusion.typ b/2_background/9_conclusion.typ index bed54d5..2e1c6e7 100644 --- a/2_background/9_conclusion.typ +++ b/2_background/9_conclusion.typ @@ -1,6 +1,6 @@ #import "../lib.typ": APK, pb1, pb2, pb3, pb1-text, pb2-text, pb3-text -== Conclusion +== Conclusion In this chapter, looked at the specificities of Android and the usual tools used as a basis for reverse engeenering applications. Many contributions have been done to static analysis, and benchmarks have been proposed to compare the different tools that resulted from those contributions. diff --git a/2_background/main.typ b/2_background/main.typ index 0069c86..5ce5a32 100644 --- a/2_background/main.typ +++ b/2_background/main.typ @@ -4,10 +4,11 @@ #epigraph("Alexis \"Lex\" Murphy, Jurassic Park")[This is a Unix system. I know this.] -#include("0_intro.typ") -#include("1_android.typ") -#include("2_tools.typ") -#include("3_static_analysis.typ") +#include("1_intro.typ") +#include("2_android_bg.typ") +#include("3_problem_statements.typ") +#include("4_soa.typ") + #include("4_datasets_and_benchmarking.typ") #include("5_platform_classes.typ") #include("6_classloading.typ")