This commit is contained in:
parent
f309dd55b8
commit
d7df45b206
8 changed files with 64 additions and 56 deletions
|
@ -15,7 +15,8 @@ This represents #num(5000) applications over the #NBTOTALSTRING total of the ini
|
|||
Among them, we could not retrieve 43 from Androzoo, leaving us with #num(dyn_res.all.nb) applications to test.
|
||||
|
||||
We will first look at the results of the dynamic analysis and look at the bytecode we intercepted.
|
||||
Then, we will study the impact the instrumentation has on static analysis tools, notably on their success rate, and we will finish with the analysis of a handcrafted application to check whether the instrumentation does, in fact, improve the results of analysis tools.
|
||||
Then, we will study the impact the instrumentation has on static analysis tools, notably on their success rate.
|
||||
Additionally, we will study with the analysis of a handcrafted application to check whether the instrumentation does, in fact, improve the results of analysis tools.
|
||||
|
||||
=== Dynamic Analysis Results <sec:th-dyn-failure>
|
||||
|
||||
|
@ -32,7 +33,8 @@ We expected some issues related to the use of an emulator, like the lack of x86_
|
|||
We manually looked at some applications, but did not find a notable pattern.
|
||||
In some cases, the application was just broken -- for instance, an application was trying to load a native library that simply does not exist in the application.
|
||||
In other cases, Frida is to blame: we found some cases where calling a method from Frida can confuse the #ART.
|
||||
`protected` methods need to be called from the class that defined the method or one of its child classes, but Frida might be considered by the #ART as another class, leading to the #ART aborting the application.
|
||||
`protected` methods cannot be called from a class other than the one that defined the method or one of its children.
|
||||
The issue is that Frida might be considered by the #ART as another class, leading to the #ART aborting the application.
|
||||
#todo[jfl was suppose to test a few other app #emoji.eyes]
|
||||
@tab:th-dyn-visited shows the number of applications that we analysed, if we managed to start at least one activity and if we intercepted code loading or reflection.
|
||||
It also shows the average number of activities visited (when at least one activity was started).
|
||||
|
@ -53,7 +55,7 @@ As shown in the table, even if the application fails to start an activity, somet
|
|||
table.cell(rowspan: 2)[nb apk],
|
||||
table.vline(end: 3),
|
||||
table.vline(start: 4),
|
||||
table.cell(colspan: 2, inset: (bottom: 0pt))[nb failled],
|
||||
table.cell(colspan: 2, inset: (bottom: 0pt))[nb failed],
|
||||
table.vline(end: 3),
|
||||
table.vline(start: 4),
|
||||
table.cell(colspan: 2, inset: (bottom: 0pt))[activities visited],
|
||||
|
@ -77,7 +79,7 @@ As shown in the table, even if the application fails to start an activity, somet
|
|||
) <tab:th-dyn-visited>
|
||||
|
||||
The high number of applications that did not start an activity means that our results will be highly biased.
|
||||
The code/method that might be loaded/called by reflection from inside activities is filtered out by the limit of or dynamic execution.
|
||||
The code/method that might be loaded/called by reflection from inside activities is filtered out by the limit of our dynamic execution.
|
||||
This bias must be kept in mind while reading the next subsection that studies the bytecode that we intercepted.
|
||||
|
||||
=== The Bytecode Loaded by Application <sec:th-code-collected>
|
||||
|
@ -121,7 +123,7 @@ To estimate the scope of the code we made available, we use Androguard to genera
|
|||
@tab:th-compare-cg shows the number of edges of those call graphs.
|
||||
The columns before and after show the total number of edges of the graphs, and the diff column indicates the number of new edges detected (#ie the number of edges after instrumentation minus the number of edges before).
|
||||
This number include edges from the bytecode loaded dynamically, as well as the call added to reflect reflection calls, and calls to "glue" methods (method like `Integer.intValue()` used to convert objects to scalar values, or calls to `T.check_is_Xxx_xxx(Method)` used to check if a `Method` object represent a known method).
|
||||
The last column, "Added Reflection", is the list of non-glue method calls found in the call graph of the instrumented application but neither in call graph of the original #APK, nor in the call graphes of the added bytecode files that we computed separately.
|
||||
The last column, "Added Reflection", is the list of non-glue method calls found in the call graph of the instrumented application but neither in the call graph of the original #APK, nor in the call graphs of the added bytecode files that we computed separately.
|
||||
This corresponds to the calls we added to represent reflection calls.
|
||||
|
||||
The first application, #lower(compared_callgraph.at(0).sha256), is noticable.
|
||||
|
@ -155,14 +157,14 @@ This is consistent with the behaviour of a packer: the application loads the mai
|
|||
caption: [Edges added to the call graphs computed by Androguard by instrumenting the applications]
|
||||
) <tab:th-compare-cg>
|
||||
|
||||
Unfortunately, our implementation of the transformation is imperfect and does fails sometime, as illustrated by #lower("5D2CD1D10ABE9B1E8D93C4C339A6B4E3D75895DE1FC49E248248B5F0B05EF1CE") in @tab:th-compare-cg.
|
||||
Unfortunately, our implementation of the transformation is imperfect and sometimes fails, as illustrated by #lower("5D2CD1D10ABE9B1E8D93C4C339A6B4E3D75895DE1FC49E248248B5F0B05EF1CE") in @tab:th-compare-cg.
|
||||
However, over the #num(dyn_res.all.nb - dyn_res.all.nb_failed) applications whose dynamic analysis finished in our experiment, #num(nb_patched) were patched.
|
||||
The remaining #mypercent(dyn_res.all.nb - dyn_res.all.nb_failed - nb_patched, dyn_res.all.nb - dyn_res.all.nb_failed) failed either due to some quirk in the zip format of the #APK file, because of a bug in our implementation when exceeding the method reference limit in a single #DEX file, or in the case of #lower("5D2CD1D10ABE9B1E8D93C4C339A6B4E3D75895DE1FC49E248248B5F0B05EF1CE"), because the application reused the original application classloader to load new code instead of instanciated a new classes loader (a behavior we did not expected as not possible using only the #SDK, but enabled by hidden #APIs).
|
||||
Taking into account the failure from both dynamic analysis and the instrumentation process, we have a #mypercent(dyn_res.all.nb - nb_patched, dyn_res.all.nb) failure rate.
|
||||
This is a reasonable failure rate, but we should keep in mind that it adds up to the failure rate of the other tools we want to use on the patched application.
|
||||
|
||||
To check the impact on the finishing rate of our instrumentation, we then run the same experiment we ran in @sec:rasta.
|
||||
We run the tools on the #APK before and after instrumentation, and compared the finishing rates in @fig:th-status-npatched-vs-patched (without taking into account #APKs we failed to patch#footnote[Due to a handling error during the experiment, the figure shows the results for #nb_patched_rasta #APKs instead of #nb_patched.]).
|
||||
We run the tools on the #APK before and after instrumentation, and compared the finishing rates in @fig:th-status-npatched-vs-patched (without taking into account #APKs we failed to patch#footnote[Due to a handling error during the experiment, the figure shows the results for #nb_patched_rasta #APKs instead of #nb_patched. \ We also ignored the tool from Wognsen #etal due to the high number of timeouts]).
|
||||
|
||||
The finishing rate comparison is shown in @fig:th-status-npatched-vs-patched.
|
||||
We can see that in most cases, the finishing rate is either the same or slightly lower for the instrumented application.
|
||||
|
@ -181,16 +183,16 @@ On the other hand, Saaf do not detect the issue with Apktool and pursues the ana
|
|||
width: 100%,
|
||||
alt: "",
|
||||
)
|
||||
place(center + horizon, rotate(24deg, text(red.transparentize(0%), size: 20pt, "PRELIMINARY RESULTS")))
|
||||
//place(center + horizon, rotate(24deg, text(red.transparentize(0%), size: 20pt, "PRELIMINARY RESULTS")))
|
||||
},
|
||||
caption: [Exist status of static analysis tools on original #APKs (left) and patched #APKs (right)]
|
||||
caption: [Exit status of static analysis tools on original #APKs (left) and patched #APKs (right)]
|
||||
) <fig:th-status-npatched-vs-patched>
|
||||
|
||||
#todo[Flowdroid results are inconclusive: some apks have more leak after and as many apks have less? also, runing flowdroid on the same apk can return a different number of leak???]
|
||||
|
||||
=== Example
|
||||
|
||||
In this subsection, we use our approach on a small #APK to look in more detail into the analysis of the transformed application.
|
||||
In this subsection, we use our approach on a unique #APK to look in more detail into the analysis of the transformed application.
|
||||
We handcrafted this application for the purpose of demonstrating how this can help a reverse engineer in their work.
|
||||
Accordingly, this application is quite small and contains both dynamic code loading and reflection.
|
||||
We defined a method `Utils.source()` and `Utils.sink()` to model a method that collects sensitive data and a method that exfiltrates data.
|
||||
|
@ -228,10 +230,10 @@ public class Main {
|
|||
|
||||
A first analysis of the content of the application shows that the application contains one `Activity` that instantiates the class `Main` and calls `Main.main()`.
|
||||
@lst:th-demo-before shows most of the code of `Main` as returned by Jadx.
|
||||
We can see that the class contains another #DEX file encoded in base 64 and loaded in the `InMemoryDexClassLoader` `cl`.
|
||||
A class is then loaded from this class loader, and two methods from this class loader are called.
|
||||
The names of this class and methods are not directly accessible as they have been chipĥered and are decoded just before being used at runtime.
|
||||
Here, the encryption key is available statically, and in theory, a very good static analyser implementing Android `Cipher` #API could compute the actual methods called.
|
||||
We can see that the class contains another #DEX file encoded in base 64 and loaded in the `InMemoryDexClassLoader` `cl` (line 7).
|
||||
A class is then loaded from this class loader (line 11), and two methods from this class loader are called (line 14).
|
||||
The names of this class and methods are not directly accessible as they have been ciphered and are decoded just before being used at runtime.
|
||||
Here, the encryption key is available statically (line 6), and in theory, a very good static analyser implementing Android `Cipher` #API could compute the actual methods called.
|
||||
However, we could easily imagine an application that gets this key from a remote command and control server.
|
||||
In this case, it would be impossible to compute those methods with static analysis alone.
|
||||
When running Flowdroid on this application, it computed a call graph of 43 edges on this application, and no data leaks.
|
||||
|
@ -240,7 +242,7 @@ This is not particularly surprising considering the obfuscation methods used.
|
|||
Then we run the dynamic analysis we described in @sec:th-dyn on the application and apply the transformation described in @sec:th-trans to add the dynamic information to it.
|
||||
This time, Flowdroid computes a larger call graph of 76 edges, and does find a data leak.
|
||||
Indeed, when looking at the new application with Jadx, we notice a new class `Malicious`, and the code of `Main.main()` is now as shown in @lst:th-demo-after:
|
||||
the method called in the loop is either `Malicious.get_data`, `Malicious.send_data()` or `Method.invoke()`.
|
||||
the method called in the loop is either `Malicious.get_data`, `Malicious.send_data()` or `Method.invoke()` (lines 9, 11 and 12).
|
||||
Although self-explanatory, verifying the code of those methods indeed confirms that `get_data()` calls `Utils.source()` and `send_data()` calls `Utils.sink()`.
|
||||
|
||||
#figure(
|
||||
|
@ -297,6 +299,6 @@ In red on the figure however, we have the calls that were hidded by reflection i
|
|||
|
||||
#v(2em)
|
||||
|
||||
To conclude, we showed that our approach indeed improves the results of analysis tools without impacting their finishing rates too much.
|
||||
To conclude, we showed that our approach indeed improves the results of analysis tools without impacting their finishing rates much.
|
||||
Unfortunately, we also noticed that our dynamic analysis is suboptimal, either due to our experimental setup or due to our solution to explore the applications.
|
||||
In the next section, we will present in more detail the limitations of our solution, as well as future work that can be done to improve the contributions presented in this chapter.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue