fix typos up to ch 3
All checks were successful
/ test_checkout (push) Successful in 50s

This commit is contained in:
Jean-Marie 'Histausse' Mineau 2025-12-21 13:37:29 +01:00
parent d34e403ca5
commit 3b5df50248
Signed by: histausse
GPG key ID: B66AEEDA9B645AD2
16 changed files with 1629 additions and 248 deletions

View file

@ -3,9 +3,10 @@
== Introduction
In order to understand the challenges of reverse engineering Android applications, we first need to understand some key concepts and specificities of Android.
In particular, the format in which applications are distributed, as well as the runtime environment that runs those applications, is very specific to Android.
In particular, the format in which applications are distributed, as well as the runtime environment that runs those applications, are very specific to Android.
To handle those specificities, a reverse engineer must have appropriate tools.
Some of those tools are used recurrently, either by the reverse engineer themself, or as a basis for other more complex tools that implement more advanced analysis techniques.
// NOTE: "reverse engineer themself": both themself and themselves are correct, I prefere themself here because it's one specific non gendered engineer.
Among those techniques, the ones that do not require running the application are called static analysis.
Over time, many of those tools have been released.

View file

@ -11,7 +11,7 @@ Those changes make Android a unique operating system.
==== Android Applications <sec:bg-android-apk>
Applications in the Android ecosystem are distributed in the #APK format.
#APK files are #JAR files with additional features, which are themself #ZIP files with additional features.
#APK files are #JAR files with additional features, which are themselves #ZIP files with additional features.
A minimal #APK file contains a file `AndroidManifest.xml`, the `META-INF/` folder containing the #JAR manifest and signature files, and an #APK Signing Block at the end of the #ZIP file.
The code of the application is then stored in a custom format, the Dalvik bytecode, or in the binary ELF format, called native code in the Android ecosystem, or both.

View file

@ -119,7 +119,7 @@ This time, instead of methods, the nodes represent instructions, and the edges i
@fig:bg-fizzbuzz-cg-cfg c) represents the control-flow graph of @fig:bg-fizzbuzz-cg-cfg a), with code statements instead of bytecode instructions.
Once the control-flow graph is computed, it can be used to compute data-flows.
Data-flow analysis, also called taint-tracking, is used to follow the flow of information in the application.
Data-flow analysis/*, also called taint-tracking, reviewer note: not really, taint-tracking \in data flow analysis*/is used to follow the flow of information in the application.
By defining a list of methods and fields that can generate critical information (taint sources) and a list of methods that can consume information (taint sinks), taint-tracking detects potential data leaks (if a data flow links a taint source and a taint sink).
For example, `TelephonyManager.getImei()` returns a unique, persistent, device identifier.
This can be used to identify the user, and it cannot be changed if compromised.

View file

@ -14,7 +14,7 @@ They analysed 92 publications and classified them by goal, method used to solve
In particular, they listed 27 approaches with an open-source implementation available.
Interestingly, a lot of the tools listed rely on common tools to interact with Android applications/#DEX bytecode.
Reccuring examples of such support tools are Apktool (#eg Amandroid~@weiAmandroidPreciseGeneral2014, Blueseal~@shenInformationFlowsPermission2014, SAAF~@hoffmannSlicingDroidsProgram2013), Androguard (#eg Adagio~@gasconStructuralDetectionAndroid2013, Appareciumn~@titzeAppareciumRevealingData2015, Mallodroid~@fahlWhyEveMallory2012) or Soot (#eg Blueseal~@shenInformationFlowsPermission2014, DroidSafe~@DBLPconfndssGordonKPGNR15, Flowdroid~@Arzt2014a): those tools are built incrementally, on top of each other.
Recuring examples of such support tools are Apktool (#eg Amandroid~@weiAmandroidPreciseGeneral2014, Blueseal~@shenInformationFlowsPermission2014, SAAF~@hoffmannSlicingDroidsProgram2013), Androguard (#eg Adagio~@gasconStructuralDetectionAndroid2013, Appareciumn~@titzeAppareciumRevealingData2015, Mallodroid~@fahlWhyEveMallory2012) or Soot (#eg Blueseal~@shenInformationFlowsPermission2014, DroidSafe~@DBLPconfndssGordonKPGNR15, Flowdroid~@Arzt2014a): those tools are built incrementally, on top of each other.
This strengthens our idea that being able to reuse previous tools is important.
Nevertheless, Li #etal focus more on the techniques and features described in the reviewed publications, and experiments to evaluate whether the pointed out software are still usable were not performed.
@ -23,7 +23,7 @@ Nevertheless, Li #etal focus more on the techniques and features described in th
//Data-flow analysis is the subject of many contribution~@weiAmandroidPreciseGeneral2014 @titzeAppareciumRevealingData2015 @bosuCollusiveDataLeak2017 @klieberAndroidTaintFlow2014 @DBLPconfndssGordonKPGNR15 @octeauCompositeConstantPropagation2015 @liIccTADetectingInterComponent2015, the most notable tool being Flowdroid~@Arzt2014a.
We will now explore this direction further by looking at other works that have been done to evaluate different analysis tools.
Those evaluations often take the form of benchmarks and follow a similar method (we will look at the different contributions in more detail in @sec:bg-bench).
Those evaluations often take the form of benchmarks and follow a similar method (we will look at the different contributions in more details in @sec:bg-bench).
They start by selecting a set of tools with similar goals to compare.
Usually, those contributions are comparing existing tools to their own, but some contributions do not introduce a new tool and focus on surveying the state of the art for some technique.
They then selected a dataset of applications to analyse.
@ -57,7 +57,7 @@ In addition to those datasets, AndroZoo~@allixAndroZooCollectingMillions2016 col
Currently, Androzoo contains more than 25 million applications that can be downloaded by researchers from the SHA256 hash of the application.
Androzoo also provides additional information about the applications, like the date the application was detected for the first time by Androzoo or the number of antiviruses from VirusTotal that flagged the application as malicious.
This will allow us to sample a dataset of applications evenly distributed over the years.
In addition to providing researchers with easy access to real-world applications, Androzoo make it a lot easier to share datasets for reproducibility: instead of sharing hundreds of #APK files, the list of SHA256 is enough.
In addition to providing researchers with easy access to real-world applications, Androzoo makes it a lot easier to share datasets for reproducibility: instead of sharing hundreds of #APK files, the list of SHA256 is enough.
==== Benchmarking <sec:bg-bench>

View file

@ -29,7 +29,7 @@ Additionally, our problem statement does not focus on spoofing classes at runtim
Contributions about Android class loading focus on using the capabilities of class loading to extend Android features or to prevent reverse engineering of Android applications.
For instance, Zhou #etal~@zhou_dynamic_2022 extend the class loading mechanism of Android to support regular Java bytecode, and Kritz and Maly~@kriz_provisioning_2015 propose a new class loader to automatically load modules of an application without user interactions.
Regarding reverse engineering, class loading mechanisms are frequently used by packers for hiding all or parts of the code of an application~@Duan2018.
Regarding reverse engineering, class loading mechanisms are frequently used by packers, applications that load their actual code at runtime, for hiding all or parts of the code of an application~@Duan2018.
For example, packers exploits the class loading capability of Android to load new code.
They also combine the loading with code generation from ciphered assets or code modification from native code calls~@liao2016automated to increase the difficulty of recovery of the code.
Because parts of the original code will be only available at runtime, deobfuscation approaches propose techniques that track #DEX structures when manipulated by the application~@zhang2015dexhunter @xue2017adaptive @wong2018tackling.

View file

@ -46,7 +46,7 @@ It resulted that dynamic code loading was mostly related to mobile advertisement
Similarly, StaDynA~@zhauniarovichStaDynAAddressingProblem2015 is a framework that generates a call graph statically, then uses dynamic analysis to analyse dynamic code loading and reflection calls to complete this call graph.
The issue with those approaches is that they are only compatible with their own subsequent analysis.
For instance, StaDynA only provide the call graph, and cannot be used as is to improve the capacity of Flowdroid.
For instance, StaDynA only provides the call graph, and cannot be used as is to improve the capacity of Flowdroid.
This is unfortunate: the reverse engineer's next step will depend on the context.
Not being able to reuse the result of a previous analysis with any ad hoc tools greatly limits their options.
AppSpear has an interesting solution to this issue: the code it intercepts is repackaged inside a new #APK file that Android analysis tools should be able to analyse.
@ -74,7 +74,7 @@ Samhi #etal~@samhi_jucify_2022 followed this direction to unify the analysis of
Their tool, JuCify, uses Angr~@angrPeople to generate the call graph of the native code, and uses heuristics to encode this call graph into Jimple that can then be added to the Jimple generated by Soot from the bytecode of the application.
Like IccTa, they use Flowdroid to analyse this new augmented representation of the application, but it should be usable by any analysis tools relying on Soot.
Finally, DroidRA~@li_droidra_2016 use the COAL~@octeauCompositeConstantPropagation2015 solver to statically compute the reflection information.
Finally, DroidRA~@li_droidra_2016 uses the COAL~@octeauCompositeConstantPropagation2015 solver to statically compute the reflection information.
The reflection calls are transformed into direct calls inside the application using Soot.
Using COAL makes DroidRA quite good at solving the simpler cases, where the names of classes and methods targeted by reflection are already present in the application.
Those cases are quite common; being able to solve those without resorting to dynamic analysis is quite useful.