thesis/4_class_loader/5_discussion.typ

#import "../lib.typ": SDK, AOSP, DEX, ART, jm-note, todo

== Discussion <sec:cl-disc>

#todo[small intro]

=== Countermeasures <sec:cl-countermeasures>

Countermeasures against shadow attacks depend on each tool and its objectives.
The first important recommendation is to implement the class selection algorithm according to the algorithm described in Listing @lst:cl-loading-alg.
It should solve any case of self-shadowing, except for tools like Apktool, which do not have to select a class for computing the result, but show the whole application's content.
For those tools, a clear warning should be added, pointing out that multiple implementations have been found and displaying the one that will be used at runtime.

Countermeasures against #SDK shadow and Hidden shadow attacks are more complex to handle: they require the list of platform classes on the target smartphone and, in some cases, their implementation.
The list of #SDK classes can be extracted easily from android.jar, but hidden classes need to be obtained by other means.
They could be listed directly from the #AOSP tree of the Android source code, obtained from Android documentation, or extracted from the phone itself.
The first approach requires statically analysing the source code, which can be difficult to achieve as several programming languages are used, and the code base is large and fragmented.
Ideally, the documentation would be the best solution, but as discussed earlier in the chapter, it can lack some classes.
For this solution to be viable, Google would need to keep the documentation closer to the released version of Android than it currently is.
Also, smartphone manufacturers might add additional classes that would not appear in Google documentation.
In fact, neither the documentation nor the source code approach can be generalised for all possible versions of Android, as the exact list will depend on the exact targeted device, possibly modified by the manufacturer.
Thus, to counter Shadow attacks, the static analysis tools that we evaluated need to embed multiple lists of platform classes, one for each Android version.
Then, the best heuristic would be to use the list of platform classes that is closest to the target #SDK of the analysed application.

Some tools, like Flowdroid, would require additional countermeasures: to compute the exact flow of data, Flowdroid also needs to analyse the code of platform classes.
For the #SDK classes, Flowdroid has already analysed them, but the hidden classes have not.
In addition to the data flow in hidden classes, Flowdroid needs a list of data sources and sinks from those classes.
Other analysis tools may require additional data from platform classes, which may be too difficult to obtain.

We believe that analysis tools can handle shadow attacks to some degree.
The implementation of the solution will differ depending on the nature of the tool and may not always require the same implementation effort.

=== Relation with Obfuscation Techniques <sec:cl-cross-obf>

As described in the state of the art, reverse engineers face other techniques of obfuscation, such as packers or native code.
These techniques rely on custom class loaders that load new parts of the application from ciphered assets or from the network.
The reverse engineers have to study the application dynamically to recover new classes and eventually go back to a static phase to understand the behaviour of the application.
In this section, we compare shadow attacks with these techniques and discuss how they interact with them.

Advanced obfuscation techniques relying on packers have a higher impact on the difficulty of performing a static analysis compared to shadow attacks.
Most of the time, the reverse engineer cannot deobfuscate the application without performing a dynamic analysis.
For these reasons, approaches have been designed to assist the capture of the bytecode that is loaded dynamically, after the precise time where the deobfuscation methods have been executed~@zhang2015dexhunter @xue2017adaptive @wong2018tackling.
On the contrary, a shadow attack can be easily defeated by implementing our algorithm in the static analysis tool, as discussed earlier in @sec:cl-countermeasures.
Nevertheless, shadow attacks are stealthier than packers or native code.
Packers can be easily spotted by artefacts left behind in the application or by detecting classes implementing a custom class loading mechanism.
On the contrary, an extra class implementing a shadow attack, that would not be executed, could contain voluntarily little code, compared to the executed class of Android.
Such an attack would be more discreet than a packer that adds in the application a lot of possibly native code.

Combining regular obfuscation techniques with shadow attacks can be achieved in two ways.

First, the attacker could hide the code of a packer or a native call by using a shadow attack.
For example, by colliding a class of the #SDK, a control flow analysis could be wrongly computed, leading to considering that part of the code to be dead, which would mislead the reverse engineer about the use of this part that contains a packer.
At runtime, this code would be triggered, unpacking new code.

Second, the attacker could use a packer to unpack code at runtime in a first phase.
The reverse engineer would have to perform a dynamic analysis, for example, using a tool such as Dexhunter~@zhang2015dexhunter, to recover new #DEX files that are loaded by a custom class loader.
Then, the reverse engineer would go back to a new static analysis and could have the problem of solving shadow attacks, for example, if a class is defined multiple times in the loaded #DEX files.

Because the interactions between shadow attacks and other obfuscation techniques often rely on a loading mechanism implemented by the developer, investigating these cases requires analysing the Java bytecode that is handling the loading.
This problem is left as future work.

=== Limitations <sec:cl-ttv>

During the analysis of the #ART internals, we made the hypothesis that its different operating modes are equivalent: we analysed the loading process for classes stored as non-optimised `.dex` format, and not for the pre-compiled `.oat`.
It is a reasonable hypothesis to suppose that the two implementations have been produced from the same algorithm using two compilation workflows.
Similarly, we assumed that the platform classes stored in `boot.art` are the same as the ones in `BOOTCLASSPATH`.
We confirm empirically our hypothesis on an Android Emulator, but we may have missed some edge cases.

The comparison of Smali code can lead to underestimated values, for example, if the compilation process performs minor modifications such as instruction reordering.
The ratios reported in this study for the comparison of code are thus a lower bound and would be higher with a more precise comparison.
In addition, platform classes are stored differently in older versions of Android and cannot be easily retrieved.
For this reason, we did not compare the classes found in applications to their versions older than #SDK 32 to avoid producing unreliable statistics for those versions.


=== Futur Works <sec:cl-futur>

As we just said, our Smali-based comparison of class implementation is quite naive and could use more work.
It could be insightful to be able to detect exactly when two classes are from the same source file, or which version of a library a class belong to.
More importantly, a better comparison technique would allow us to detect cases where the shadowed library has actual malicious bytecode added that we could have missed manually.

Additionally, the question of dynamic class loaders, used manually by the application developer, is interesting.
This is reaching the limits of static analysis; those cases involve dynamically loading bytecode, and in many cases, the classes loaded by those class loaders are not even available for analysis.
However, even with dynamic analysis, the behaviour of class loaders can still be an issue, especially when the analysis is performed by alternating static and dynamic analysis, as is often the case when manually reversing an application.
To handle those cases, it could be interesting to develop a method to model any arbitrary class loader, either by analysing its bytecode or by interacting with an instance of the class loader dynamically.

In September 2024 (just after we finished this work), Android 15 introduced support for the new version 41 of the #DEX format.
We can expect this version of #DEX to become the norm in a few years.
The most notable change in version 41 is the new container format: instead of storing the bytecode in separate #DEX files, the different files can now be concatenated into one unique file.
There is also some permeability between the concatenated files: some structures stored in one file can be used by the next concatenated files.
This significant change in the bytecode storage is similar to the introduction of the multi-dex format.
Considering that self-shadowing is only possible because of the multi-dex format, we can expect this change to have the potential to introduce new, similar issues.
Thus, we believe that the implementation details of this new version should be studied and modelled properly to avoid introducing new issues when updating analysis tools to support it.
Just by reading the specification#footnote[https://source.android.com/docs/core/runtime/dex-format#container], we believe that self-shadowing between concatenated #DEX files is possible, unless additional checks are enforced by the #ART when loading the file.

#jm-note[Maybe talk about v41 in RASTA? this will break a lot of things]