227 lines
14 KiB
Typst
227 lines
14 KiB
Typst
#import "../lib.typ": eg, todo
|
|
#import "X_var.typ": *
|
|
|
|
== Obfuscation Techniques <sec:cl-obfuscation>
|
|
|
|
In this section, we present new obfuscation techniques that take advantage of the complexity of the class loading process.
|
|
Then, in order to evaluate their efficiency, we reviewed some common Android reverse analysis tools to see how they behave when collisions occur between classes of the APK or between a class of the APK and classes of Android (#Asdk or #hidec).
|
|
We call this collision "*class shadowing*", because the attacker version of the class shadows the one that will be used at runtime.
|
|
To evaluate if such shadow attacks are working, we handcrafted three applications implementing shadowing techniques to test their impact on static analysis tools.
|
|
Then, we manually inspected the output of the tools in order to check its consistency with what Android is really doing at runtime.
|
|
For example, for Apktool, we look at the output disassembled code, and for Flowdroid@Arzt2014a, we check that a flow between `Taint.source()` and `Taint.sink()` is correctly computed.
|
|
|
|
|
|
/*
|
|
shadow: faie une collision de classe
|
|
hidden: utiliser une classe de l'API cachée
|
|
|
|
on peut shadow une classe de l'apk
|
|
on peut shadow une classe du SDK
|
|
on peut shadow une classe hidden
|
|
*/
|
|
|
|
|
|
=== Obfuscation Techniques
|
|
|
|
From the results presented in @sec:cl-loading, three approaches can be designed to hide the behavior of an application.
|
|
|
|
/*
|
|
_Hidden classes_
|
|
Applications both malicious and benign have been known to use hidden API to access advance features #todo[ref ?].
|
|
Using #hidec can have an impact on the accuracy of analysis tools because they may not have access to the code of these classes.
|
|
|
|
#todo[Google blacklist/greylist/ect, ref to paper that says this can be bypass]
|
|
|
|
#todo[Compare classes in android.jar, framework.jar and other, are they hidden whitelisted classes?]
|
|
|
|
The two previous attacks have a few issue.
|
|
Basic shadowing imply to have several class with the same name in the application, which can be detected by some tools.
|
|
On the other hand, using #hidec leave classes without implementation in the application, which can also be detected.
|
|
*/
|
|
|
|
*Self shadow*_: shadowing a class with another from APK_
|
|
This method consists in hiding the implementation of a class with another one by exploiting the possible collision of class names, as described in @sec:cl-collision with multiple #dexfiles.
|
|
If reversers or tools ignore the priority order of a multi-dex file, they can take into account the wrong version of a class.
|
|
|
|
|
|
//priorité aux classes SDK meme si une shadow classe est définie dans l'APK (tout ca a cause de Boot)
|
|
*SDK shadow*_: shadowing a SDK class_
|
|
This method consists in presenting to the reverser a fake implementation of a class of the SDK.
|
|
This class is embedded in the APK file and has the same name as the one of the SDK.
|
|
Because `BootClassLoader` will give priority to the #Asdk at runtime, the reverser or tool should ignore any version of a class that is contained in the APK.
|
|
The only constraint when shadowing an SDK class is that the shadowing implementation must respect the signature of real classes.
|
|
Note that, by introducing a custom class loader, the attacker could inverse the priority, but this case is out of the scope of this paper.
|
|
|
|
// priorité aux classes hidden (car du SDK) meme si une shadow classe est définie dans l'APK
|
|
*Hidden shadow*_: shadowing an hidden class_
|
|
This method is similar to the previous one, except the class that is shadowed is a #hidecsingular.
|
|
Because ART will give priority to the internal version of the class, the version provided in the APK file will be ignored.
|
|
Such shadow attacks are more difficult to detect by a reverser, that may not know the existence of this specific hidden class in Android.
|
|
|
|
=== Impact on static analysis tools <sec:cl-evaltools>
|
|
|
|
#figure(
|
|
```java
|
|
public class Main {
|
|
public static void main(Activity ac) {
|
|
String personal_data = Taint.source();
|
|
String obfuscated_personal_data = Obfuscation.hide_flow(personal_data);
|
|
Taint.sink(ac, obfuscated_personal_data);
|
|
}
|
|
}
|
|
public class Obfuscation { // customized for each obfuscation technique
|
|
public static String hide_flow(String personal_data) { ... }
|
|
```,
|
|
caption: [Main body of test apps]
|
|
)<lst:cl-testapp>
|
|
|
|
|
|
We selected tools that are commonly used to unpack and reverse Android applications: Jadx#footnote[https://github.com/skylot/jadx], a decompiler for Android applications, Apktool#footnote[https://apktool.org/], a disassembler/repackager of applications, Androguard#footnote[https://github.com/androguard/androguard], one of the oldest Python package for manipulating Android applications, and Flowdroid@Arzt2014a that performs taint flow analysis.
|
|
|
|
For evaluating the tools, we designed a single application that we can customize for different tests.
|
|
@lst:cl-testapp shows the main body implementing:
|
|
- a possible flow to evaluate FlowDroid: a flow from a method `Taint.source()` to a method `Taint.sink(Activity, String)` through a method `Obfuscation.hide_flow(String)`;
|
|
- a possible use of a SDK or hidden class inside the class `Obfuscation` to evaluate #platc shadowing for other tools.
|
|
|
|
The first application we released is a control application that does not do anything special.
|
|
It will be used for checking the expecting result of tools.
|
|
The second implements self shadowing: the class `Obfuscation` is duplicated: one is the same as the in the control app (`Obfuscation.hide_flow(String)` returns its arguments), and the other version returns a constant string.
|
|
These two versions are embedded in several DEX of a multi-dex application.
|
|
The third application tests SDK shadowing and needs an existing class of the SDK.
|
|
We used `Pair` that we try to shadow.
|
|
We put data in a `Pair` and reread the data from the `Pair`. The colliding `Pair` discards the data and returns null.
|
|
The last application tests for Hidden API shadowing.
|
|
Like for the third one, we similarly store data in `com.android.okhttp.Request` and then retrieve it.
|
|
Again, the shadowing implementation discards the data.
|
|
|
|
We found that these static analysis tools do not consider the class loading mechanism, either because the tools only look at the content of the application file (#eg a disassembler) or because they consider class loading to be a dynamic feature and thus out of their scope.
|
|
In @tab:cl-results, we report on the types of shadowing that can be tricked each tool.
|
|
A plain circle is a shadow attack that leads to a wrong result.
|
|
A white circle indicates a tool emitting warnings or that eventually displays the two versions of the class.
|
|
A cross is a tool not impacted by a shadow attack.
|
|
We explain in more detail in the following the results for each considered tool.
|
|
|
|
#figure({
|
|
table(
|
|
columns: 5,
|
|
stroke: none,
|
|
align:(left+horizon, center+horizon, center+horizon, center+horizon, center+horizon),
|
|
table.hline(),
|
|
table.header(
|
|
table.cell(colspan: 5, inset: 3pt)[],
|
|
table.cell(rowspan: 2)[Tool],
|
|
table.cell(rowspan: 2)[Version],
|
|
table.vline(end: 3),
|
|
table.vline(start: 4),
|
|
table.cell(colspan: 3)[Shadow Attack],
|
|
[Self], [SDK], [Hidden],
|
|
),
|
|
table.cell(colspan: 5, inset: 3pt)[],
|
|
table.hline(),
|
|
table.cell(colspan: 5, inset: 3pt)[],
|
|
|
|
[Jadx], [1.5.0], [#warn], [#ok], [#ok],
|
|
[Apktool], [2.9.3], [#warn], [#ok], [#ok],
|
|
[Androguard], [4.1.2], [#warn], [#ok], [#ok],
|
|
[Flowdroid], [2.13.0], [#ok], [#ko], [#ok],
|
|
|
|
table.cell(colspan: 5, inset: 3pt)[],
|
|
table.hline(),
|
|
)
|
|
[#ok: working \ #warn: works but producing warning or can be seen by the reverser \ #ko: not working]
|
|
},
|
|
caption: [Working attacks against static analysis tools]
|
|
) <tab:cl-results>
|
|
|
|
==== Jadx
|
|
|
|
Jadx is a reverse engineering tool that regenerates the Java source code of an application.
|
|
It processes all the classes present in the application, but only save/display one class by name, even if two versions are present in multiple #dexfiles.
|
|
Nevertheless, when multiple classes with the same name are found, Jadx reports it in a comment added to the generated Java source code.
|
|
This warning stipulates that a possible collision exists and lists the files that contain the different versions of the class.
|
|
Unfortunately, after reviewing the code of Jadx, we believe that the selection of the displayed class is an undefined behavior.
|
|
At least for the version 1.5.0 that we tested, we found that Jadx selects the wrong implementation when a class with the same name is present.
|
|
For example in `classes2.dex` and `classes3.dex`.
|
|
We report it with a "#warn" because warnings are issued.
|
|
|
|
//Using #hidec does not affect Jadx beyond the fact that #hidec are not decompiled, which is to be expected by the user anyway.
|
|
|
|
Shadowing #Asdk and #hidec is possible in Jadx: there is only one implementation of the class in the application and Jadx does not have a list of the internal classes of Android: no warning is issued to the reverser that the displayed class is not the one used by Android.
|
|
|
|
==== Apktool
|
|
|
|
Apktool generates Smali files, an assembler language for DEX bytecode.
|
|
Apktool will store the disassembled classes in a folder that matches the #dexfile that stores the bytecode.
|
|
This means that when shadowing a class with two versions in two #dexfiles, the shadow implementations will be disassembled into two directories.
|
|
No indication is displayed that a collision is possible.
|
|
It is up to the reverser to have a chance to open the good one.
|
|
|
|
Similarly to Jadx, using an #Asdk or #hidecsingular will not be detected by the tool that will unpack the fake shadow version.
|
|
|
|
==== Androguard
|
|
|
|
Androguard has different usages, with different levels of analysis.
|
|
The documentation highlights the analysis commands that compute three types of objects: an APK object, a list of DEX objects, and an Analysis object.
|
|
The APK and the list of #dexfiles are a one-to-one representation of the content of an application, and have the same issues that we discussed with Apktool: they provide the different versions of a shadow class contained in multiple #dexfiles.
|
|
|
|
The Analysis object is used to compute a method call graph and we found that this algorithm may choose the wrong version of a shadowed class when using the cross references that are computed.
|
|
This leads to an invalid call graph as shown in @fig:cl-andro_obf_cg: the two methods `doSomething()` are represented in the graph, but the one linked to `main()` on the graph is the one calling the method `good()` when in fact the method `bad()` is called when running the application.
|
|
|
|
Androguard has a method `.is_external()` to detect if the implementation of a class is not provided inside the application and a method `.is_android_api()` to detect if the class is part of the Android API.
|
|
Regrettably, the documentation of `.is_android_api()` explains that the method is still experimental and just checks a few package names.
|
|
This means that although those methods are useful, the only indication of the use of an #Asdk or #hidec is the fact that the class is not in the APK file.
|
|
Because of that, like for Apktool and Jadx, Androguard has no way to warn the reverser that the shadow of an #Asdk or #hidec is not the class used when running the application.
|
|
|
|
#todo[alt text androguard_call_graph]
|
|
|
|
#figure({
|
|
set align(center)
|
|
stack(dir: ltr,[
|
|
#figure(
|
|
image(
|
|
"figs/call_graph_expected.svg",
|
|
width: 45%,
|
|
alt: ""
|
|
),
|
|
supplement: [Subfigure],
|
|
caption: [Expected Call Graph]
|
|
) <fig:cl-andro_non_obf_cg>],[
|
|
#figure(
|
|
image(
|
|
"figs/call_graph_obf.svg",
|
|
width: 45%,
|
|
alt: ""
|
|
),
|
|
supplement: [Subfigure],
|
|
caption: [Call Graph Computed by Androguard]
|
|
) <fig:cl-andro_obf_cg>
|
|
])
|
|
h(1em)},
|
|
caption: [Call Graphs of an application calling `Main.bad()` from a shadowed `Obfuscation` class.],
|
|
)<fig:cl-androguard_call_graph>
|
|
|
|
==== Flowdroid
|
|
|
|
Flowdroid@Arzt2014a is used to detect if an application can leak sensitive information.
|
|
To do so, the analyst provides a list of source and sink methods.
|
|
The return value of a method marked as source is considered sensitive and the argument of a method marked as sink is considered to be leaked.
|
|
By analyzing the bytecode of an application, Flowdroid can detect if data emitted by source methods can be exfiltrated by a sink method.
|
|
Flowdroid is built on top of the Soot@Arzt2013 framework that handles, among other things, the class selection process.
|
|
|
|
We found that when selecting the classes implementation in a multi-dex APK, Soot uses an algorithm close to what ART is performing:
|
|
Soot sorts the `.dex` bytecode file with a specified `prioritizer` (a comparison function that defines an order for #dexfiles) and selects the first implementation found when iterating over the sorted files.
|
|
Unfortunately, the `prioritizer` used by Soot is not exactly the same as the one used by the ART.
|
|
The Soot `prioritizer` will give priority to `classes.dex` and then give priority to files whose name starts with `classes` over other files and finally will use the alphabetical order.
|
|
This order is good enough for application with a small number of #dexfiles generated by Android Studio, but because it uses the alphabetical order and does not check the exact format used by Android, a malicious developer could hide the implementation of a class in `classes2.dex` by putting a false implementation in `classes0.dex`, `classes1.dex` or `classes12.dex`.
|
|
|
|
// TODO This could use more investigation
|
|
In addition to self shadowing, Flowdroid is sensitive to the use of #platc, as it needs the bytecode of those classes to be able to track data flows.
|
|
This is solved for SDK classes by providing `android.jar` to Flowdroid.
|
|
Flowdroid gives priority to the classes from the SDK over the classes implemented in the application, thus defeating SDK shadow attacks.
|
|
Unfortunately, `android.jar` only contains classes from the #Asdk, meaning that using #hidec breaks the flow tracking.
|
|
Solving this issue would require finding the bytecode of all the platform classes of the Android version targeted and as we said previously it requires extracting this information from the emulator.
|
|
|
|
//\medskip
|
|
|
|
We have seen that tools can be impacted by shadow attacks. In the next section, we will investigate if these attacks are used in the wild.
|
|
|