thesis/5_theseus/5_limits.typ

#import "../lib.typ": paragraph, ART, DEX, APK, eg
#import "../lib.typ": todo, jfl-note, jm-note

== Limitations and Futur Works <sec:th-limits>

The method we presented in this section has a number of underdeveloped aspects.
In this section we will present those issues and potential avenues of improvement.

=== Bytecode Transformation

#paragraph[Custom Classloaders][
The first obvious limitation of our bytecode transformation is that we do not know what custom classloadrs do, so we cannot accuratly reproduce statically their behavior.
We elected to fallback to the behavior of the `BaseDexClassLoader`, which is the highest Android specific classloader in the inheritance hierarchy, and whose behavior is shared by all classloaders safe `DelegateLastClassLoader`.
The current implementation of the #ART enforce some restrictions on the classloaders behavior to optimize the runtime performance by caching classes.
This gives us some garanties that custom classesloaders will keep a some coherences will the classic classloaders.
For instance, a class loaded dynamically must have the same name as the name used in `ClassLoader.loadClass()`.
This make `BaseDexClassLoader` a good estimation for legitimate classloaders, however, an obfuscated application could use the techniques discussed in @sec:cl-cross-obf, in wich case our model would be entirelly wrong.

It would be interesting to expore if some form of static analysis like symbolic execution could be used to extract the behavior of an ad hoc class loader and be used to model the class used appropriately.
A more reasonable approach however would be to improve the static analysis to intercept each calls of `loadClass()` of each class loaders, including implicite calls performed by the #ART.
This would allow to collect a mapping $("class loader", "class name") -> "class"$ that can then be used when renaming colliding classes.
]

#paragraph[Multiple Classloaders for one `Method.invoke()`][
Although we managed to handle call to different methods from one `Method.invoke()` site, we do not handle calling methods from different classloaders with colliding classes definition.
The first reason is that it is quite challenging to compare classloaders statically.
At runtime, each object has an unique identifier that can be used to compare them over the course of the same execution, but this identifier is reset each time the application starts.
This means we cannot use this identifier in an `if` condition to differentiate the classloaders.
Ideally, we would combine the hash of the loaded #DEX files, the classloader class and parent to make an unique, static identifier, but the #DEX files loaded by a classloader cannot be accessed at runtime without accessing the process memory at arbitrary locations.
For some classloaders, the string representation returned by `Object.toString()` list the location of the loaded #DEX file on the file system.
This is not the case for the commonly used `InMemoryClassLoader`.
In addition, the #DEX files are often located in the application private folder, whose name is derived from the hash of the #APK itself.
Because we modify the application, the path of the private folder also change, and so will the string representation of the classloaders.
Checking the classloader of a classes can also have side-effect on classloaders that delegate to the main application classloader:
because we inject the classes in the #APK, the classes of the classloader are now already in the main application classloader, which in most case will have priority on the other classloaders, and lead to the class beeing loaded by the application classloader instead of the original classloader.
If we check for the classloader, we would need to considere such cases en rename each classes of each classloader before reinjecting them to the in the application.
This would greatly increase the risk of breaking the application during its transformation.
Instead, we elected to ignore the classloaders when selecting the method to invoque.
This leads to potential invalid runtime behaviore, as the first method that matching the class name will be called, but the alternative methods from other classloader still appears in the new application, albeit in a block that might be flagged as dead-code by a sufficiently advenced static analyser.
]

#paragraph[`ClassNotFoundException` may not be raised][
In the very specific situation where the original application tries to access a class from dynamically loaded bytecode without actually accessing this bytecode (#eg by using the wrong class loader), the patched application behavior will differ.
The original application should raise a `ClassNotFoundException`, but in the patched application, the class will be accible and the exception will not be raised.
In pactice, their is not a lot of reason to do such thing.
One could be to check if the #APK as been tempered with, but their are easier ways to do thins, like checking the application signature.
Another would be to check if the class is already available, and if not, load it dynamically, in wich case it does not matter as code loaded dynamically is already present.
In any cases, statically, because we remove neither the calls to the function that load the classes (like `ClassLoader.loadClass(..)`) nor the `try` / `catch` blocks, static analysis tools those can handle the original behavior should still be hable to access the old behavior.
]


=== Dynamic Analysis

#paragraph[Anti Evasion][
  Our dynamic analysis does not permform any kind of anti-evasive technique.
  Any application implementing even basic evasion will detect our environment and will probably not load malicious bytecode.
  Running the dynamic analysis in a appropriate sandbox such as DroidDungeon should improve the results significantly.
]

#paragraph[Code Coverage][
  In @sec:th-dyn-failure, we saw that our dynamic analysis performed poorly.
  It may be due to our experimental setup, and it is possible that a better sandbox will fix the issue.
  However their is a larger code coverage issue.
  We tried to manually analysed a few applications marked as malware on MalwareBazaar to test our method.
  Although we did confirm statically that the applications where using reflection and dynamic code loading, we did not managed to trigger this behavior at runtime, and other obfuscation technique make it verry difficult to determine statically the required condition to trigger them.
  Thus, we believe that techniques to improve code coverage are indeed needed when analysing application.
  This could mean better exploration techniques such as the one implemented by Stoat and GroddDroid, or more intrusive approched such as forced excecution.
]

=== Comparision to DroidRA

It would be very interesting to compare our tool to DroidRA.
DroidRA is a tool that compute reflection information using static analysis and patch the application to add those calls to the application?
Beyond the classic comparison static vs dynamic, DroidRA has a similar goal and strategy to ours.
Two notable comparison criteria would be the failure rate and the number of edges added to an application call graph.
The first criterion indicate how much the results can be used by other tools, while the second indicate how effective the approaches are.