thesis/5_theseus/4_results.typ
Jean-Marie Mineau e9bc1572e9
Some checks failed
/ test_checkout (push) Failing after 26s
add results, of a sort
2025-09-08 17:06:24 +02:00

203 lines
12 KiB
Typst

#import "../lib.typ": todo, SDK, num, mypercent, ART, ie, APKs, jfl-note
#import "X_var.typ": *
#import "../3_rasta/X_var.typ": NBTOTALSTRING
== Results <sec:th-res>
#todo[better section name for @sec:th-res]
To studdy the impact of our transformation on analysis tools, we reused applications from the dataset we sampled in @sec:rasta/*-dataset*/.
Because we are running the application on a rescent version of Android (#SDK 34), we only took the most recent applications: the one collected in 2023.
This represent #num(5000) applications over the #NBTOTALSTRING total of the initial dataset.
Among them, we could not retrieve 43 from Androzoo, leaving us with #num(dyn_res.all.nb) applications to test.
=== Dynamic Analysis Results <sec:th-dyn-failure>
After running the dynamic analysis on our dataset the first time we realised our dynamic setup was quite fragile.
We found that #mypercent(dyn_res.all.nb_failed_first_run, dyn_res.all.nb) of the execution failed with various errors.
The majority of those errors were related to faillures to connect to the Frida agent or start the activity from Frida.
Some of those errors seamed to come from Frida, while other seamed related to the emulator failing to start the application.
We found that relaunching the analysis for the applications that failled was the most simple way to fix those issues, and after 6 passes we went from #num(dyn_res.all.nb_failed_first_run) to #num(dyn_res.all.nb_failed) application that could not be analysed.
The remaining errors look more related to the application itself or Android, with #num(96) errors being a failure to install the application, and #num(110) other beeing a null pointer exception from Frida.
Infortunatly, although we managed to start the applications, we can see from the list of activity visited by GroddDroid that a majority (#mypercent(dyn_res.all.z_act_visited, dyn_res.all.nb - dyn_res.all.nb_failed)) of the application stopped before even starting one activity.
Some applications do not have an activity, and are not intended to interact with a user, but those are clearly a minority and do not explain such a high number.
We expected some issue related to the use of an emulator, like the lack of x86_64 library in the applications, or contermesures aborting the application if the emulator is detected.
We manually looked at some applications, but did not found a notable pattern.
In some cases, the application was just broken -- for instance, an application was trying to load a native library that simply does not exists in the application.
In other case, Frida is to blame: we found some cases where calling a method from Frida can confuse the #ART.
`protected` methods needs to be called from the class that defined the method or one of its children calsses, but Frida might be considered by the #ART as an other class, leading to the #ART aborting the application.
#todo[jfl was suppose to test a few other app #emoji.eyes]
@tab:th-dyn-visited shows the number of applications that we analysed, if we managed to start at least one activity and if we intercepted code loading or reflection.
It also shows the average number of activities visited (when at least one activity was started).
This average slightly higher than 1, which seems reasonable: a lot of applications do not need more than one activity, but some do and we did manage to explore at least some of those additionnal activities.
As shown in the table, even if the application fails to start an activity, some times it will still load external code or use reflection.
#figure({
let nb_col = 7
table(
columns: nb_col,
stroke: none,
inset: 7pt,
align: center+horizon,
table.header(
table.hline(),
table.cell(colspan: nb_col, inset: 2pt)[],
table.cell(rowspan: 2)[],
table.cell(rowspan: 2)[nb apk],
table.vline(end: 3),
table.vline(start: 4),
table.cell(colspan: 2, inset: (bottom: 0pt))[nb failled],
table.vline(end: 3),
table.vline(start: 4),
table.cell(colspan: 2, inset: (bottom: 0pt))[activities visited],
table.vline(end: 3),
table.vline(start: 4),
table.cell(rowspan: 2)[average nb \ activities when > 0],
[1#super[st] pass], [6#super[th] pass],
[0], [$>= 1$],
),
table.cell(colspan: nb_col, inset: 2pt)[],
table.hline(),
table.cell(colspan: nb_col, inset: 2pt)[],
[All], num(dyn_res.all.nb), num(dyn_res.all.nb_failed_first_run), num(dyn_res.all.nb_failed), num(dyn_res.all.z_act_visited), num(dyn_res.all.nz_act_visited), num(dyn_res.all.avg_nz_act),
[With Reflection], num(dyn_res.reflection.nb), [], [], num(dyn_res.reflection.z_act_visited), num(dyn_res.reflection.nz_act_visited), num(dyn_res.reflection.avg_nz_act),
[With Code Loading], num(dyn_res.code_loading.nb), [], [], num(dyn_res.code_loading.z_act_visited), num(dyn_res.code_loading.nz_act_visited), num(dyn_res.code_loading.avg_nz_act),
table.cell(colspan: nb_col, inset: 2pt)[],
table.hline(),
)},
caption: [Summary of the dynamic exploration of the applications from the RASTA dataset collected by Androzoo in 2023]
) <tab:th-dyn-visited>
The high number of application that did not start an activity means that our result will be highly biaised.
The code that might be loaded or method that might be called by reflection from inside activities is filtered out by the limit of or dynamic execution.
This biaised must be kept in mind when reading the next subsection that studdy the bytecode that we intercepted.
=== The Bytecode Loaded by Application <sec:th-code-collected>
We collected a total of #nb_bytecode_collected files for #dyn_res.code_loading.nb application that we detected loading bytecode dynamicatlly.
#num(92) of them were loaded by a `DexClassLoader`, #num(547) were loaded by a `InMemoryDexClassLoader` and #num(1) was loaded by a `PathClassLoader`.
Once we compared the files, we found that we only collected #num(bytecode_hashes.len()) distinct files, and that #num(bytecode_hashes.at(0).at(0)) where identicals.
Once we looked more in details, we found that most of those files are advertisement libraries.
In total, we collected #num(nb_google) files containing Google ads librairies and #num(nb_facebook) files containing Facebook ads librairies.
In addition, we found #num(nb_appsflyer) files containing code that we believe to be AppsFlyer, and company that provides "measurement, analytics, engagement, and fraud protection technologies".
The remaining #num(nb_bytecode_collected - nb_google - nb_appsflyer - nb_facebook) files were custom code from high security applications (#ie banking, social security)
@tab:th-bytecode-hashes sumarize the information we collected about the most common bytecode files.
#figure(
table(
columns: 4,
stroke: none,
align: center+horizon,
table.header(
[Nb Occurences], [SHA 256], [Content], [Format]
),
table.hline(),
..bytecode_hashes.slice(0, 10)
.map(
(e) => (num(e.at(0)), [#e.at(1).slice(0, 10)...], ..e.slice(2))
).flatten(),
table.cell(colspan: 4)[...],
table.hline(),
),
caption: [Most common dynamically loaded files]
) <tab:th-bytecode-hashes>
=== Impact on Analysis Tools Finishing Rate
#todo[alt text @fig:th-status-npatched-vs-patched]
#todo[Check SAAF and IC3 results on patched]
#figure({
image(
"figs/comparision-of-exit-status.svg",
width: 100%,
alt: "",
)
place(center + horizon, rotate(24deg, text(red.transparentize(0%), size: 20pt, "PRELIMINARY RESULTS")))
},
caption: [Exist status of static analysis tools on original #APKs (left) and patched #APKs (right)]
) <fig:th-status-npatched-vs-patched>
#todo[Check if flowdroid improve, compare sucess rate of RASTA, show result for demo app]
#jfl-note[Combien d'app tranforme? on parle des 888? on fait les 2 tranformation sur chaque apk? ca reussit tout le temps?]
=== Example
We use on our approach on a small #APK.
We handcrafted this application for the purpose of demonstrating how this can improve help a reverse engineer in its work.
Accordingly, this application is quite small and contains boff dynamic code loading and reflection.
We defined a method `Utils.source()` and `Utils.sink()` to model respectively a method that collect sensitive data and that exfiltrate data.
Those methods are the one we will use with Flowdroid to track data flows.
#figure(
```java
package com.example.theseus;
public class Main {
private static final String DEX = "ZGV4CjA [...] EAAABEAwAA";
Activity ac;
private Key key = new SecretKeySpec("_-_Secret Key_-_".getBytes(), "AES");
ClassLoader cl = new InMemoryDexClassLoader(ByteBuffer.wrap(Base64.decode(DEX, 2)), Main.class.getClassLoader());
public void main() throws Exception {
String[] strArr = {"n6WGYJzjDrUvR9cYljlNlw==", "dapES0wl/iFIPuMnH3fh7g=="};
Class<?> loadClass = this.cl.loadClass(decrypt("W5f3xRf3wCSYcYG7ckYGR5xuuESDZ2NcDUzGxsq3sls="));
Object obj = "imei";
for (int i = 0; i < 2; i++) {
obj = loadClass.getMethod(decrypt(strArr[i]), String.class, Activity.class).invoke(null, obj, this.ac);
}
}
public String decrypt(String str) throws Exception {
Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");
cipher.init(2, this.key);
return new String(cipher.doFinal(Base64.decode(str, 2)));
}
...
}
```
caption: [Code of the main class of the application showed by Jadx, before patching],
)<fig:th-demo-before>
A first analysis of the contant of the application shows that the application contains one `Activity` that instanciate the class `Main` and call `Main.main()`.
@fig:th-demo-before shows the most of the code of `Main` as returned by Jadx.
We can see that the class contains another #DEX file encoded in base 64 and loaded in the `InMemoryDexClassLoader` `cl`.
A class is then loaded from this class loader, and two methods from this class loader are called.
The names of this class and methods are not directly accessible as they have been chipĥered and are decoded just before beeing used at runtime.
Here, the encryption key is available statically, and in theorie, a verry good static analyser implementing Android `Cipher` #API could compute the actual methods called.
However, we could easily imagine an application that gets this key from a remote command and control server.
In this case, it would be impossible to compute those methods with static analysis alone.
When running Flowdroid on this application, it computed a callgraph of 43 edges on this application, an no data leaks.
This is not particularly surprising considering the obfusctation methods used.
Then we run the dynamic analysis we described in @sec:th-dyn on the application and apply the transformation described in @sec:th-trans to add the dynamic informations to it.
This time, Flowdroid compute a larger callgraph of 76 edges, and does find a data leak.
Indeed, when looking at the new application with Jadx, we notice a new class `Malicious`, and the code of `Main.main()` is now as shown in @figth-demo-after:
the method called in the loop is either `Malicious.get_data`, `Malicious.send_data()` or `Method.invoke()`.
Although self explanatory, verifying the code of those methods indeed confirm that `get_data()` calls `Utils.source()` and `send_data()` calls `Utils.sink()`.
#figure(
```java
public void main() throws Exception {
String[] strArr = {"n6WGYJzjDrUvR9cYljlNlw==", "dapES0wl/iFIPuMnH3fh7g=="};
Class<?> loadClass = this.cl.loadClass(decrypt("W5f3xRf3wCSYcYG7ckYGR5xuuESDZ2NcDUzGxsq3sls="));
Object obj = "imei";
for (int i = 0; i < 2; i++) {
Method method = loadClass.getMethod(decrypt(strArr[i]), String.class, Activity.class);
Object[] objArr = {obj, this.ac};
obj = T.check_is_Malicious_get_data_fe2fa96eab371e46(method) ?
Malicious.get_data((String) objArr[0], (Activity) objArr[1]) :
T.check_is_Malicious_send_data_ca50fd7916476073(method) ?
Malicious.send_data((String) objArr[0], (Activity) objArr[1]) :
method.invoke(null, objArr);
}
}
```
caption: [Code of `Main.main()` showed by Jadx, after patching],
)<fig:th-demo-after>
#todo[androgard call graph]