thesis/5_theseus/6_limits.typ

#import "../lib.typ": paragraph, ART, DEX, APK, eg
#import "../lib.typ": todo, jfl-note, jm-note

== Limitations and Future Works <sec:th-limits>

The method we presented in this chapter has a number of underdeveloped aspects.
In this section, we will present those issues and potential avenues of improvement related to the bytecode transformation, the dynamic analysis and DroidRA, a tool similar to our solution.

=== Bytecode Transformation

#paragraph[Custom Class Loaders][
The first obvious limitation of our bytecode transformation is that we do not know what custom class loaders do, so we cannot accurately reproduce statically their behaviour.
We elected to fallback to the behaviour of the `BaseDexClassLoader`, which is the highest Android-specific class loader in the inheritance hierarchy, and whose behaviour is shared by all class loaders except `DelegateLastClassLoader`.
The current implementation of the #ART enforces some restrictions on the class loader's behaviour to optimise the runtime performance by caching classes.
This gives us some guarantees that custom class loaders will keep some coherence with the classic class loaders.
For instance, a class loaded dynamically must have the same name as the name used in `ClassLoader.loadClass()`.
This makes `BaseDexClassLoader` a good approximation for legitimate class loaders.
However, an obfuscated application could use the techniques discussed in @sec:cl-cross-obf, in which case our model would be entirely wrong.

It would be interesting to explore if some form of static analysis, like symbolic execution, could be used to extract the behaviour of an ad hoc class loader and be used to model the class used appropriately.
A more reasonable approach would be to improve the static analysis to intercept each call of `loadClass()` of each class loader, including implicit calls performed by the #ART.
This would allow us to collect a mapping $("class loader", "class name") -> "class"$ that can then be used when renaming colliding classes.
]

#paragraph[Multiple Class Loaders for one `Method.invoke()`][
Although we managed to handle calls to different methods from one `Method.invoke()` site, we do not handle calling methods from different class loaders with colliding class definitions.
The first reason is that it is quite challenging to compare class loaders statically.
At runtime, each object has a unique identifier that can be used to compare them over the course of the same execution, but this identifier is reset each time the application starts.
This means we cannot use this identifier in an `if` condition to differentiate the class loaders.
Ideally, we would combine the hash of the loaded #DEX files, the class loader class and parent to make a unique, static identifier, but the #DEX files loaded by a class loader cannot be accessed at runtime without accessing the process memory at arbitrary locations.
For some class loaders, the string representation returned by `Object.toString()` lists the location of the loaded #DEX file on the file system.
This is not the case for the commonly used `InMemoryClassLoader`.
In addition, the #DEX files are often located in the application's private folder, whose name is derived from the hash of the #APK itself.
Because we modify the application, the path of the private folder also changes, and so will the string representation of the class loaders.
Checking the class loader of a class can also have side effects on class loaders that delegate to the main application class loader:
because we inject the classes in the #APK, the classes of the class loader are now already in the main application class loader, which in most cases will have priority over the other class loaders, and lead to the class being loaded by the application class loader instead of the original class loader.
If we check for the class loader, we would need to consider such cases and rename each class of each class loader before reinjecting them in the application.
This would greatly increase the risk of breaking the application during its transformation.
Instead, we elected to ignore the class loaders when selecting the method to invoke.
This leads to potential invalid runtime behaviour, as the first method that matches the class name will be called, but the alternative methods from other class loaders still appear in the new application, albeit in a block that might be flagged as dead code by a sufficiently advanced static analyser.
]

#paragraph[`ClassNotFoundException` may not be raised][
In the very specific situation where the original application tries to access a class from dynamically loaded bytecode without actually accessing this bytecode (#eg by using the wrong class loader), the patched application behaviour will differ.
The original application should raise a `ClassNotFoundException`, but in the patched application, the class will be accessible and the exception will not be raised.
In practice, there is not a lot of reason to do such a thing.
One could be to check if the #APK has been tampered with, but there are easier ways to do this, like checking the application signature.
Another would be to check if the class is already available, and if not, load it dynamically, in which case it does not matter, as code loaded dynamically is already present.
In any case, statically, because we remove neither the calls to the function that load the classes (like `ClassLoader.loadClass(..)`) nor the `try` / `catch` blocks, static analysis tools that can handle the original behaviour should still be able to access the old behaviour.
]


=== Dynamic Analysis

#paragraph[Anti Evasion][
  Our dynamic analysis does not perform any kind of anti-evasive technique.
  Any application implementing even basic evasion will detect our environment and will probably not load malicious bytecode.
  Running the dynamic analysis in an appropriate sandbox such as DroidDungeon should improve the results significantly.
]

#paragraph[Code Coverage][
  In @sec:th-dyn-failure, we saw that our dynamic analysis performed poorly.
  It may be due to our experimental setup, and it is possible that a better sandbox will fix the issue.
  However, there is a larger code coverage issue.
  We tried to manually analyse a few applications marked as malware on MalwareBazaar to test our method.
  Although we did confirm that the applications were using reflection and dynamic code loading with a static analysis, we did not manage to trigger this behaviour at runtime, and other obfuscation techniques make it very difficult to determine statically the required condition to trigger them.
  Thus, we believe that techniques to improve code coverage are indeed needed when analysing applications.
  This could mean better exploration techniques, such as the one implemented by Stoat and GroddDroid, or more intrusive approaches, such as forced execution.
]

=== Comparison with DroidRA

It would be very interesting to compare our tool to DroidRA.
DroidRA is a tool that computes reflection information using static analysis and patches the application to add those calls.
Beyond the classic comparison of static versus dynamic, DroidRA has a similar goal and strategy to ours.
Two notable comparison criteria would be the failure rate and the number of edges added to an application call graph.
The first criterion indicates how much the results can be used by other tools, while the second indicates how effective the approaches are.