203 lines
12 KiB
Typst
203 lines
12 KiB
Typst
#import "../lib.typ": todo, SDK, num, mypercent, ART, ie, APKs, jfl-note
|
|
#import "X_var.typ": *
|
|
#import "../3_rasta/X_var.typ": NBTOTALSTRING
|
|
|
|
== Results <sec:th-res>
|
|
|
|
#todo[better section name for @sec:th-res]
|
|
|
|
To studdy the impact of our transformation on analysis tools, we reused applications from the dataset we sampled in @sec:rasta/*-dataset*/.
|
|
Because we are running the application on a rescent version of Android (#SDK 34), we only took the most recent applications: the one collected in 2023.
|
|
This represent #num(5000) applications over the #NBTOTALSTRING total of the initial dataset.
|
|
Among them, we could not retrieve 43 from Androzoo, leaving us with #num(dyn_res.all.nb) applications to test.
|
|
|
|
=== Dynamic Analysis Results <sec:th-dyn-failure>
|
|
|
|
After running the dynamic analysis on our dataset the first time we realised our dynamic setup was quite fragile.
|
|
We found that #mypercent(dyn_res.all.nb_failed_first_run, dyn_res.all.nb) of the execution failed with various errors.
|
|
The majority of those errors were related to faillures to connect to the Frida agent or start the activity from Frida.
|
|
Some of those errors seamed to come from Frida, while other seamed related to the emulator failing to start the application.
|
|
We found that relaunching the analysis for the applications that failled was the most simple way to fix those issues, and after 6 passes we went from #num(dyn_res.all.nb_failed_first_run) to #num(dyn_res.all.nb_failed) application that could not be analysed.
|
|
The remaining errors look more related to the application itself or Android, with #num(96) errors being a failure to install the application, and #num(110) other beeing a null pointer exception from Frida.
|
|
|
|
Infortunatly, although we managed to start the applications, we can see from the list of activity visited by GroddDroid that a majority (#mypercent(dyn_res.all.z_act_visited, dyn_res.all.nb - dyn_res.all.nb_failed)) of the application stopped before even starting one activity.
|
|
Some applications do not have an activity, and are not intended to interact with a user, but those are clearly a minority and do not explain such a high number.
|
|
We expected some issue related to the use of an emulator, like the lack of x86_64 library in the applications, or contermesures aborting the application if the emulator is detected.
|
|
We manually looked at some applications, but did not found a notable pattern.
|
|
In some cases, the application was just broken -- for instance, an application was trying to load a native library that simply does not exists in the application.
|
|
In other case, Frida is to blame: we found some cases where calling a method from Frida can confuse the #ART.
|
|
`protected` methods needs to be called from the class that defined the method or one of its children calsses, but Frida might be considered by the #ART as an other class, leading to the #ART aborting the application.
|
|
#todo[jfl was suppose to test a few other app #emoji.eyes]
|
|
@tab:th-dyn-visited shows the number of applications that we analysed, if we managed to start at least one activity and if we intercepted code loading or reflection.
|
|
It also shows the average number of activities visited (when at least one activity was started).
|
|
This average slightly higher than 1, which seems reasonable: a lot of applications do not need more than one activity, but some do and we did manage to explore at least some of those additionnal activities.
|
|
As shown in the table, even if the application fails to start an activity, some times it will still load external code or use reflection.
|
|
|
|
#figure({
|
|
let nb_col = 7
|
|
table(
|
|
columns: nb_col,
|
|
stroke: none,
|
|
inset: 7pt,
|
|
align: center+horizon,
|
|
table.header(
|
|
table.hline(),
|
|
table.cell(colspan: nb_col, inset: 2pt)[],
|
|
table.cell(rowspan: 2)[],
|
|
table.cell(rowspan: 2)[nb apk],
|
|
table.vline(end: 3),
|
|
table.vline(start: 4),
|
|
table.cell(colspan: 2, inset: (bottom: 0pt))[nb failled],
|
|
table.vline(end: 3),
|
|
table.vline(start: 4),
|
|
table.cell(colspan: 2, inset: (bottom: 0pt))[activities visited],
|
|
table.vline(end: 3),
|
|
table.vline(start: 4),
|
|
table.cell(rowspan: 2)[average nb \ activities when > 0],
|
|
|
|
[1#super[st] pass], [6#super[th] pass],
|
|
[0], [$>= 1$],
|
|
),
|
|
table.cell(colspan: nb_col, inset: 2pt)[],
|
|
table.hline(),
|
|
table.cell(colspan: nb_col, inset: 2pt)[],
|
|
[All], num(dyn_res.all.nb), num(dyn_res.all.nb_failed_first_run), num(dyn_res.all.nb_failed), num(dyn_res.all.z_act_visited), num(dyn_res.all.nz_act_visited), num(dyn_res.all.avg_nz_act),
|
|
[With Reflection], num(dyn_res.reflection.nb), [], [], num(dyn_res.reflection.z_act_visited), num(dyn_res.reflection.nz_act_visited), num(dyn_res.reflection.avg_nz_act),
|
|
[With Code Loading], num(dyn_res.code_loading.nb), [], [], num(dyn_res.code_loading.z_act_visited), num(dyn_res.code_loading.nz_act_visited), num(dyn_res.code_loading.avg_nz_act),
|
|
table.cell(colspan: nb_col, inset: 2pt)[],
|
|
table.hline(),
|
|
)},
|
|
caption: [Summary of the dynamic exploration of the applications from the RASTA dataset collected by Androzoo in 2023]
|
|
) <tab:th-dyn-visited>
|
|
|
|
The high number of application that did not start an activity means that our result will be highly biaised.
|
|
The code that might be loaded or method that might be called by reflection from inside activities is filtered out by the limit of or dynamic execution.
|
|
This biaised must be kept in mind when reading the next subsection that studdy the bytecode that we intercepted.
|
|
|
|
=== The Bytecode Loaded by Application <sec:th-code-collected>
|
|
|
|
We collected a total of #nb_bytecode_collected files for #dyn_res.code_loading.nb application that we detected loading bytecode dynamicatlly.
|
|
#num(92) of them were loaded by a `DexClassLoader`, #num(547) were loaded by a `InMemoryDexClassLoader` and #num(1) was loaded by a `PathClassLoader`.
|
|
|
|
Once we compared the files, we found that we only collected #num(bytecode_hashes.len()) distinct files, and that #num(bytecode_hashes.at(0).at(0)) where identicals.
|
|
Once we looked more in details, we found that most of those files are advertisement libraries.
|
|
In total, we collected #num(nb_google) files containing Google ads librairies and #num(nb_facebook) files containing Facebook ads librairies.
|
|
In addition, we found #num(nb_appsflyer) files containing code that we believe to be AppsFlyer, and company that provides "measurement, analytics, engagement, and fraud protection technologies".
|
|
The remaining #num(nb_bytecode_collected - nb_google - nb_appsflyer - nb_facebook) files were custom code from high security applications (#ie banking, social security)
|
|
@tab:th-bytecode-hashes sumarize the information we collected about the most common bytecode files.
|
|
|
|
#figure(
|
|
table(
|
|
columns: 4,
|
|
stroke: none,
|
|
align: center+horizon,
|
|
table.header(
|
|
[Nb Occurences], [SHA 256], [Content], [Format]
|
|
),
|
|
table.hline(),
|
|
..bytecode_hashes.slice(0, 10)
|
|
.map(
|
|
(e) => (num(e.at(0)), [#e.at(1).slice(0, 10)...], ..e.slice(2))
|
|
).flatten(),
|
|
table.cell(colspan: 4)[...],
|
|
table.hline(),
|
|
),
|
|
caption: [Most common dynamically loaded files]
|
|
) <tab:th-bytecode-hashes>
|
|
|
|
=== Impact on Analysis Tools Finishing Rate
|
|
|
|
#todo[alt text @fig:th-status-npatched-vs-patched]
|
|
#todo[Check SAAF and IC3 results on patched]
|
|
#figure({
|
|
image(
|
|
"figs/comparision-of-exit-status.svg",
|
|
width: 100%,
|
|
alt: "",
|
|
)
|
|
place(center + horizon, rotate(24deg, text(red.transparentize(0%), size: 20pt, "PRELIMINARY RESULTS")))
|
|
},
|
|
caption: [Exist status of static analysis tools on original #APKs (left) and patched #APKs (right)]
|
|
) <fig:th-status-npatched-vs-patched>
|
|
|
|
#todo[Check if flowdroid improve, compare sucess rate of RASTA, show result for demo app]
|
|
|
|
#jfl-note[Combien d'app tranforme? on parle des 888? on fait les 2 tranformation sur chaque apk? ca reussit tout le temps?]
|
|
|
|
=== Example
|
|
|
|
We use on our approach on a small #APK.
|
|
We handcrafted this application for the purpose of demonstrating how this can improve help a reverse engineer in its work.
|
|
Accordingly, this application is quite small and contains boff dynamic code loading and reflection.
|
|
We defined a method `Utils.source()` and `Utils.sink()` to model respectively a method that collect sensitive data and that exfiltrate data.
|
|
Those methods are the one we will use with Flowdroid to track data flows.
|
|
|
|
#figure(
|
|
```java
|
|
package com.example.theseus;
|
|
|
|
public class Main {
|
|
private static final String DEX = "ZGV4CjA [...] EAAABEAwAA";
|
|
Activity ac;
|
|
private Key key = new SecretKeySpec("_-_Secret Key_-_".getBytes(), "AES");
|
|
ClassLoader cl = new InMemoryDexClassLoader(ByteBuffer.wrap(Base64.decode(DEX, 2)), Main.class.getClassLoader());
|
|
|
|
public void main() throws Exception {
|
|
String[] strArr = {"n6WGYJzjDrUvR9cYljlNlw==", "dapES0wl/iFIPuMnH3fh7g=="};
|
|
Class<?> loadClass = this.cl.loadClass(decrypt("W5f3xRf3wCSYcYG7ckYGR5xuuESDZ2NcDUzGxsq3sls="));
|
|
Object obj = "imei";
|
|
for (int i = 0; i < 2; i++) {
|
|
obj = loadClass.getMethod(decrypt(strArr[i]), String.class, Activity.class).invoke(null, obj, this.ac);
|
|
}
|
|
}
|
|
public String decrypt(String str) throws Exception {
|
|
Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");
|
|
cipher.init(2, this.key);
|
|
return new String(cipher.doFinal(Base64.decode(str, 2)));
|
|
}
|
|
|
|
...
|
|
}
|
|
```
|
|
caption: [Code of the main class of the application showed by Jadx, before patching],
|
|
)<fig:th-demo-before>
|
|
|
|
A first analysis of the contant of the application shows that the application contains one `Activity` that instanciate the class `Main` and call `Main.main()`.
|
|
@fig:th-demo-before shows the most of the code of `Main` as returned by Jadx.
|
|
We can see that the class contains another #DEX file encoded in base 64 and loaded in the `InMemoryDexClassLoader` `cl`.
|
|
A class is then loaded from this class loader, and two methods from this class loader are called.
|
|
The names of this class and methods are not directly accessible as they have been chipĥered and are decoded just before beeing used at runtime.
|
|
Here, the encryption key is available statically, and in theorie, a verry good static analyser implementing Android `Cipher` #API could compute the actual methods called.
|
|
However, we could easily imagine an application that gets this key from a remote command and control server.
|
|
In this case, it would be impossible to compute those methods with static analysis alone.
|
|
When running Flowdroid on this application, it computed a callgraph of 43 edges on this application, an no data leaks.
|
|
This is not particularly surprising considering the obfusctation methods used.
|
|
|
|
Then we run the dynamic analysis we described in @sec:th-dyn on the application and apply the transformation described in @sec:th-trans to add the dynamic informations to it.
|
|
This time, Flowdroid compute a larger callgraph of 76 edges, and does find a data leak.
|
|
Indeed, when looking at the new application with Jadx, we notice a new class `Malicious`, and the code of `Main.main()` is now as shown in @figth-demo-after:
|
|
the method called in the loop is either `Malicious.get_data`, `Malicious.send_data()` or `Method.invoke()`.
|
|
Although self explanatory, verifying the code of those methods indeed confirm that `get_data()` calls `Utils.source()` and `send_data()` calls `Utils.sink()`.
|
|
|
|
#figure(
|
|
```java
|
|
public void main() throws Exception {
|
|
String[] strArr = {"n6WGYJzjDrUvR9cYljlNlw==", "dapES0wl/iFIPuMnH3fh7g=="};
|
|
Class<?> loadClass = this.cl.loadClass(decrypt("W5f3xRf3wCSYcYG7ckYGR5xuuESDZ2NcDUzGxsq3sls="));
|
|
Object obj = "imei";
|
|
for (int i = 0; i < 2; i++) {
|
|
Method method = loadClass.getMethod(decrypt(strArr[i]), String.class, Activity.class);
|
|
Object[] objArr = {obj, this.ac};
|
|
obj = T.check_is_Malicious_get_data_fe2fa96eab371e46(method) ?
|
|
Malicious.get_data((String) objArr[0], (Activity) objArr[1]) :
|
|
T.check_is_Malicious_send_data_ca50fd7916476073(method) ?
|
|
Malicious.send_data((String) objArr[0], (Activity) objArr[1]) :
|
|
method.invoke(null, objArr);
|
|
}
|
|
}
|
|
```
|
|
caption: [Code of `Main.main()` showed by Jadx, after patching],
|
|
)<fig:th-demo-after>
|
|
|
|
|
|
#todo[androgard call graph]
|