erf, not good, but it's something at least
All checks were successful
/ test_checkout (push) Successful in 1m47s
All checks were successful
/ test_checkout (push) Successful in 1m47s
This commit is contained in:
parent
c34eb1b838
commit
fede0bd9b2
5 changed files with 42 additions and 36 deletions
|
@ -1,23 +1,17 @@
|
|||
#import "../lib.typ": SDK, API, API, DEX, pb2, pb2-text, etal
|
||||
#import "../lib.typ": todo
|
||||
|
||||
== Android Class Loading <sec:bg-soa-cl>
|
||||
=== Android Class Loading <sec:bg-soa-cl>
|
||||
|
||||
#todo[Refactor]
|
||||
#pb2-text
|
||||
|
||||
=== Platform Classes <sec:bg-soa-platform>
|
||||
This subsection is mainly dedicated to class loading in Java and Android.
|
||||
Because we focus on the _default_ class loading algorithm, we will not focus on dynamic code loading.
|
||||
However, class loading is used to load classes other than the one in the application, without dynamic code loading.
|
||||
In the second part of this subsection we will look at the work that has been done related to those classes, the platform classes.
|
||||
|
||||
As we said earlier, hidden #API are undocumented methods that can be used by an application, thus making them a potential blind spot when analysing an application.
|
||||
However, not a lot a research has been done on the subject.
|
||||
Li #etal did an empirical study of the usage and evolution of hidden #API~@li_accessing_2016.
|
||||
They found that hidden #API are added and removed in every release of Android, and that they are used both by benign and malicious applications.
|
||||
More recently, He #etal~@he_systematic_2023 did a systematic study of hidden service #API related to security.
|
||||
They studied how the hidden #API can be used to bypass Android security restrictions and found that although Google countermeasures are effective, they need to be implemented inside the system services and not the hidden #API due to the lack of in-app privilege isolation: the framework code is in the same process as the user code, meaning any restriction in the framework can be bypassed by the user.
|
||||
Unfortunately those two contributions do not explore further the consequences of the use of hidden #API for a reverse engineer.
|
||||
==== Class Loading <sec:bg-cl>
|
||||
|
||||
=== Class Loading <sec:bg-cl>
|
||||
|
||||
Another rarely considered element of Android is its class loading mechanism.
|
||||
Class loading is a fundamental element of Java, it define which classes are loaded from where.
|
||||
In Android, this is often associated to dynamic code loading, as the `ClassLoader` objects are used to load code at runtime.
|
||||
However, class loading also intervenes to load platform classes or classes from the application itself, and thus require some attention when analysing an application.
|
||||
|
@ -41,17 +35,27 @@ They also combine the loading with code generation from ciphered assets or code
|
|||
Because parts of the original code will be only available at runtime, deobfuscation approaches propose techniques that track #DEX structures when manipulated by the application~@zhang2015dexhunter @xue2017adaptive @wong2018tackling.
|
||||
Those contributions interact with the class loading mechanism of Android to collect the #DEX structures at the right moment.
|
||||
|
||||
Deobfuscating an application is the first problem the reverse engineer has to solve.
|
||||
Nevertheless, even, if all classes of the code are recovered by the reverse engineer, understanding what are the classes that are really loaded by Android brings an additional problem.
|
||||
The reverse engineer can have the feeling that what he sees in the bytecode is what is loaded at runtime, whereas the system can choose alternative implementations of a class.
|
||||
Some classes however are not load from the application, nor dynamically load by the application.
|
||||
Those classes are platform classes, and appart from dynamic code loaded, they are the main reason class loading is needed by Android.
|
||||
We will now look at the literature related to them.
|
||||
|
||||
==== Platform Classes <sec:bg-soa-platform>
|
||||
|
||||
Platform classes are divided between #SDK classes that are documented, and the other classes, often refered to as hidden #API.
|
||||
#SDK classes are clearly listed and documented by Google, so they do not require as much attention as hidden #API.
|
||||
As we said earlier, hidden #API are undocumented methods that can be used by an application, thus making them a potential blind spot when analysing an application.
|
||||
However, not a lot a research has been done on the subject.
|
||||
Li #etal did an empirical study of the usage and evolution of hidden #API~@li_accessing_2016.
|
||||
They found that hidden #API are added and removed in every release of Android, and that they are used both by benign and malicious applications.
|
||||
More recently, He #etal~@he_systematic_2023 did a systematic study of hidden service #API related to security.
|
||||
They studied how the hidden #API can be used to bypass Android security restrictions and found that although Google countermeasures are effective, they need to be implemented inside the system services and not the hidden #API due to the lack of in-app privilege isolation: the framework code is in the same process as the user code, meaning any restriction in the framework can be bypassed by the user.
|
||||
Unfortunately those two contributions do not explore further the consequences of the use of hidden #API for a reverse engineer.
|
||||
|
||||
#v(2em)
|
||||
|
||||
Class loading mechanisms have been studies carefully in the context of the Java language.
|
||||
However, the same cannot be said about Android, whose implementation significantly from classic Java Virtual Machine.
|
||||
However, the same cannot be said about Android, whose implementation diverge significantly from classic Java Virtual Machine.
|
||||
Most work done on Android focus on extending Android capabilities using class loading, or on analysing dynamically the code loading operations of an application.
|
||||
This leaves open the question of the actual default class loading behavior of Android, leading us to #pb2:
|
||||
|
||||
#pb2-text
|
||||
|
||||
|
||||
In @sec:cl, we will model the behaviour of Android when loaded classes used by an application that do not use dynamic code loading, and check if this behaviour mach the behaviour of common analysis tools.
|
||||
We will also take some times to if the state of the art related to hidden #API is up to date with the current Android versions.
|
||||
|
|
|
@ -1,13 +1,17 @@
|
|||
#import "../lib.typ": APK, etal, ART, SDK, eg, DEX, eg, pb3, pb3-text
|
||||
#import "../lib.typ": todo, jm-note, jfl-note
|
||||
|
||||
== Allowing Static Analysis Tools to Analyse Obfuscated Application <sec:bg-soa-th>
|
||||
=== Allowing Static Analysis Tools to Analyse Obfuscated Application <sec:bg-soa-th>
|
||||
|
||||
#pb3-text
|
||||
|
||||
=== Dynamic Analysis <sec:bg-dynamic>
|
||||
Dynamic analysis of Android application have been researched for a long time.
|
||||
Like static analysis, it has its own challenges, that we will explore in this subsection.
|
||||
After that we will also look at contributions that seeked to encode results inside the #APK format, or used intrumentation to improve analyses in some way.
|
||||
|
||||
As we said previously, static analysis is not capable of analysing everything.
|
||||
Some situation, like reflection of dynamic code loading, require a different approach: dynamic analysis.
|
||||
==== Dynamic Analysis <sec:bg-dynamic>
|
||||
|
||||
Some situation, like reflection of dynamic code loading, are difficult to solve with static analysis and require a different approach: dynamic analysis.
|
||||
With dynamic analysis, the application is actually executed and the reverse engineer obserces its behavior.
|
||||
Monitoring the behavior can be achieved by various strategies: observing the filesystem, the display screen, the process memory, the kernel, ...
|
||||
Depending on the chosen level of observation, it can be technically difficult.
|
||||
|
@ -42,13 +46,13 @@ Similarly, StaDynA~@zhauniarovichStaDynAAddressingProblem2015 is a framework tha
|
|||
|
||||
The issue with those approach is that they are only compatible with their own subsequent analysis.
|
||||
For instance, StaDynA only provide the call graph, and cannot be used as is to improve the capacity of Flowdroid.
|
||||
This is unfortunate, has the reverse engineer next step will depend on the context: not beeing able to reuse the result of a previous analysis with other #jm-note[non-specialise][erf, non-specific? non-adapted?] tools limit greatly their options.
|
||||
This is unfortunate, has the reverse engineer next step will depend on the context: not beeing able to reuse the result of a previous analysis with any ad hoc tools limit greatly their options.
|
||||
AppSpear has an interesting solution to this issue: the code it intercept is repackage inside a new #APK file that Android analysis tools should be able to analyze.
|
||||
In the next section, we will explore further the contributions that take this approache of using actual application to encode its result.
|
||||
We will now explore further the contributions that take this approache of using actual application to encode its result.
|
||||
|
||||
//#todo[RealDroid sandbox bases on modified ART?]
|
||||
//#todo[force execution?]
|
||||
=== Improving Analysis with Instrumentation <sec:bg-instrumentation>
|
||||
==== Improving Analysis with Instrumentation <sec:bg-instrumentation>
|
||||
|
||||
Usually, instrumentation refers to the practice of modifying the behavior of a program to collect information during its execution.
|
||||
Frida is a good example of instrumentation framework.
|
||||
|
@ -75,7 +79,6 @@ Those cases are quite commons and beeing able to solve those without resorting t
|
|||
On the other hand, COAL will struggle to solve cases with complexe string manipulation and is simply not able to handle cases that rely on external data (#eg downloaded from the internet at runtime).
|
||||
Likewise, this can only access code loaded dynamically if the code was present inside the application without any kind of obfuscation (#eg a #DEX file in the assets of the application can be analyse, but not if it is ciphered).
|
||||
|
||||
|
||||
#v(2em)
|
||||
|
||||
Instrumenting applications to encode the result of an analysis as an unified representation has been explored before.
|
||||
|
@ -84,7 +87,6 @@ Similarly, DroidRA compute reflection information computed statically and inject
|
|||
However, AppSpear and DexLego focus primarely on specific obfuscation techniques, making there implementation difficult to port to more rescent version of Android, and DroidRA suffers the limitation of static analysis.
|
||||
We believe that instrumentation is a promising approach to encode those information.
|
||||
Especially, we think that it could be used to provide dynamic information that are not available to static analysis tools like DroidRA.
|
||||
To explore this possibility, we will try to anwser our third problem statement #pb3: #pb3-text
|
||||
|
||||
|
||||
In @sec:th, we will try use instrumentation to combine dynamica analysis (to collect code loaded dynamically and reflection information) with static analysis, indifferently of the static analysis tool used.
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
== Conclusion <sec:bg-conclusion>
|
||||
|
||||
In this chapter, looked at the specificities of Android and the usual tools used as a basis for reverse engeenering applications.
|
||||
This chapter, presented the specificities of Android and the usual tools used as a basis for reverse engeenering applications.
|
||||
Many contributions have been done to static analysis, and benchmarks have been proposed to compare the different tools that resulted from those contributions.
|
||||
Those benchmarks raised questions about the reusability of those tools and their capacity to handle real-world applications.
|
||||
We then looked at a platform classes and class loading, a commonly recognised limitation of static analysis.
|
||||
|
@ -13,10 +13,10 @@ The result of those analysis are often in an ad hoc format, making it difficult
|
|||
A few exception as well as some static analysis tools proposed an interesting solution to this issue:
|
||||
instrumenting the analyse application to encode the results of the analysis in the form of a valide #APK, a format any Android analysis tools should be able read.
|
||||
We liked this solution and believe it should be studied further.
|
||||
This process led us to our problem statements:
|
||||
This process led us to explore three problem statements:
|
||||
|
||||
/ #pb1: #pb1-text
|
||||
/ #pb2: #pb2-text
|
||||
/ #pb3: #pb3-text
|
||||
|
||||
In the next chapters, we will endeavor to contribute to the Android reverse reverse engineering field by anwsering those problematics.
|
||||
In the next chapters, we will endeavor to contribute to the Android reverse reverse engineering field by anwsering them.
|
||||
|
|
|
@ -32,6 +32,6 @@ As a summary, the contributions of this chapterare the following:
|
|||
The chapter is structured as follows.
|
||||
@sec:rasta-methodology presents the methodology employed to build our evaluation process and @sec:rasta-xp gives the associated experimental results.
|
||||
@sec:rasta-failure-analysis investigates the reasons behind the observed failures of some of the tools.
|
||||
We then compare in @sec:rasta-soa-comp our results with the contributions presented in @sec:bg-eval-tools.
|
||||
We then compare in @sec:rasta-soa-comp our results with the contributions presented in @sec:bg.
|
||||
In @sec:rasta-reco, we give recommendations for tool development we drawn from our experience running our experiment.
|
||||
Finally, @sec:rasta-limit list the limit of our approach, @sec:rasta-futur present further avenues that did not had time to pursue and @sec:rasta-conclusion concludes the chapter.
|
||||
|
|
|
@ -5,7 +5,7 @@
|
|||
|
||||
== State-of-the-Art Comparison <sec:rasta-soa-comp>
|
||||
|
||||
In this section, we will compare our results with the contributions presented in @sec:bg-eval-tools.
|
||||
In this section, we will compare our results with the contributions presented in @sec:bg.
|
||||
|
||||
Luo #etal released TaintBench~@luoTaintBenchAutomaticRealworld2022 a real-world benchmark and the associated recommendations to build such a benchmark.
|
||||
These benchmarks confirmed that some tools such as Amandroid and Flowdroid are less efficient on real-world applications.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue