thesis/4_class_loader/5_discussion.typ

#import "../lib.typ": SDK, AOSP, DEX, ART, jm-note, todo

== Discussion <sec:cl-disc>

#todo[small intro]

=== Countermeasures <sec:cl-countermeasures>

Countermeasures against shadow attacks depend on each tool and its objectives.
The first important recommendation is to implement the class selection algorithm according to the algorithm described in Listing @lst:cl-loading-alg.
It should solve any case of self-shadowing, except for tools like Apktool, which do not have to select a class for computing the result but show the whole application's content.
For those tools, a clear warning should be added, pointing out that multiple implementations have been found and displaying the one that will be used at runtime.

Countermeasures against #SDK shadow and Hidden shadow attacks are more complex to handle: it requires the list of platform classes on the target smartphone.
The list of #SDK classes can be extracted easily from android.jar, but hidden classes need to be obtained by another means.
They could be listed directly from the #AOSP tree of the Android source code, or obtained from Android documentation, or extracted from the phone itself.
The first approach requires statically analyzing the source code, which can be difficult to achieve as several programming languages are used, and the code base is large andd fragmented.
As discussed earlier in the chapter, the documentation can lack some classes.
Consequently, the most reliable source is the smartphone itself.
It should be noted that none of these methods can be generalized for all possible versions of Android, as the exact list will depend on the exact targeted device, possibly modified by the manufacturer.
Thus, to conter Shadow attaks, the static analysis tools that we evaluated need to embed multiple lists of platform classes, one for each Android version.
Then, the best heuristic would be to use the list of platform classes that is closest to the target #SDK of the analysed application.

Some tools like Flowdroid would require additional countermeasures: to compute the exact flow of data, Flowdroid also needs to analyse the code of platform classes.
For the #SDK classes, Flowdroid has already analysed them, but the hidden classes have not.
In addition to the data flow in hidden classes, Flowdroid needs a list of data sources and sinks from those classes.
Other analysis tools may require additional data from platform classes, which may be too difficult to obtain.

We believe that analysis tools can handle shadow attacks to some degree.
The implementation of the solution will differ depending on the nature tool and may not always require the same implementation effort.

=== Relation with Obfuscation Techniques <sec:cl-cross-obf>

As described in the state of the art, reverse engineers face other techniques of obfuscation such as packers or native code.
These techniques rely on custom class loaders that load new parts of the application from ciphered assets or from the network.
The reverse engineers have to study the application dynamically, to recover new classes, and eventually go back to a static phase to understand the behavior of the application.
In this section, we compare shadow attacks with these techniques and we discuss how they interact with them.

Advanced obfuscation techniques relying on packers have a higher impact on the difficulty of performing a static analysis compared to shadow attacks.
Most of the time, the reverse engineer cannot deobfuscate the application without performing a dynamic analysis.
For this reasons, approaches have been designed to assist the capture of the bytecode that is loaded dynamically, after the precise time where the deobfuscation methods have been executed~@zhang2015dexhunter @xue2017adaptive @wong2018tackling.
On the contrary, a shadow attack can be easily defeated by implementing our algorithm in the static analysis tool, as discussed earlier in @sec:cl-countermeasures.
Nevertheless, shadow attacks are stealthier than packers or native code.
Packers can be easily spotted by artifacts left behind in the application or by detecting classes implementing a custom class loading mechanism.
On the contrary, an extra class implementing a shadow attack, that would not be executed, could contain voluntarily few code, compared to the executed class of Android.
Such attack would be more discrete than a packer that adds in the application a lot of possibly native code

Combining regular obfuscation techniques with shadow attacks can be achieved in two ways.

First, the attacker could hide the code of a packer or a native call by using a shadow attack.
For example, by colliding a class of the #SDK, a control flow analysis could be wrongly computed, leading to consider that part of the code is dead, which would mislead the reverse engineer about the use of this part that contains a packer.
At runtime, this code would be triggered, unpacking new code.

Second, the attacker could use a packer to unpack code at runtime in a first phase.
The reverse engineer would have to perform a dynamic analysis, for example uising a tool such as Dexhunter~@zhang2015dexhunter, to recover new #DEX files that are loaded by a custom class loader.
Then, the reverse engineer would go back to a new static analysis and could have the problem of solving shadow attacks, for example, if a class is defined multiple times in the loaded #DEX files.

Because the interaction between shadow attacks and other obfuscations techniques often rely on a loading mechanism implemented by the developer, investigating these cases require to analyse the Java bytecode that is handling the loading.
This problem is left as future work.

=== Limitations <sec:cl-ttv>

During the analysis of the #ART internals, we made the hypothesis that its different operating modes are equivalent: we analysed the loading process for classes stored as non-optimized `.dex` format, and not for the pre-compiled `.oat`.
It is a reasonable hypothesis to suppose that the two implementations have been produced from the same algorithm using two compilation workflows.
Similarly, we assumed that the platform classes stored in `boot.art` are the same as the ones in `BOOTCLASSPATH`.
We confirm empirically our hypothesis on an Android Emulator, but we may have missed some edge cases.

The comparison of Smali code can lead to underestimated values, for example, if the compilation process performs minor modifications such as instruction reordering.
The ratios reported in this study for the comparison of code are thus a lower bound and would be higher with a more precise comparison.
In addition, platform classes are stored differently in older versions of Android and could not be easily retrieved.
For this reason, we did not compared the classes found in applications to their versions older than #SDK 32 to avoid producing unreliable statistics for those versions.


=== Futur Works <sec:cl-futur>

#todo[Develop @sec:cl-futur]

As we said, our comparison technique is quite naive and could use more work.
It could be insightful to be able to detect excatlly when two classes are from the same fource file, or which version of a library a class belong.
More importantly, a better comparision technique would allow to detect cases where the shadowed library has actual malicious bytecode added that we could have missed manually.

Additionally, the question of dynamic class loaders, used manually by the application developer is interesting.
This is reaching the limits of static analysis, thoses cases involve dynamically loading bytecode, and in many cases the classes loaded by those classe loaders are not even available for analysis.
However, even with dynamic analysis, the behavior of class loaders can still be an issue, especially when the analysis is performed by alternating static and dynamic analysis, as it is often the case in when manually reversing an application.
To handle those cases, it could be interesting to develop a method to model any arbitrary class loaders, either by analysing its bytecode or by interacting with an instance of the class loader dynamically.

In september 2024 (just after we finished this work), Android 15 introduce support for the new version 41 of the #DEX format.
We can expect this version of #DEX to become the norm in a few years.
The most notable change in version 41 is the new container format: instead of storing the bytecode in separated #DEX files, the different files can now be concatenated into one unique file.
There is also some permeability between the concatenated files: some structures stored in one file can be used by the nexts concatenated files.
This significant change in the bytecode storage is similar to the introduction of the multi-dex format.
Considering that self shadowing is only possible because of the multi-dex format, be expect this change to have the potential to introduce new similar issues.
Thus, we believe that the implementation details of this new version need to be studied and model properly to avoid introducing new issues when updating analysis tools to support it.
Just by reading the specification#footnote[https://source.android.com/docs/core/runtime/dex-format#container], we believe that self shadowing between concatenated #DEX files is possible, unless additionnal checks are enforced by the #ART when loading the file.

#jm-note[Maybe talk about v41 in RASTA? this will break a lot of things]