diff --git a/3_rasta/3_experiments.typ b/3_rasta/3_experiments.typ index 771ae81..447a05a 100644 --- a/3_rasta/3_experiments.typ +++ b/3_rasta/3_experiments.typ @@ -253,13 +253,26 @@ We can observe that all Java-based tools have a finishing rate that decreases ov 50% of non-Java-based tools have the same behaviour. ] -#todo[Alt text for fig rasta-decorelation-size] #figure(stack(dir: ltr, [#figure( image( "figs/decorelation/finishing-rate-of-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg", width: 50%, - alt: "" + alt: ( + "A graph showing the finishing rate (from 0 to 100) on the y-axe ", + "and the year the applications were first seen (from 2013 to ", + "2023) on the x-axes. There is a line for each of the following ", + "tools: anadroid, blueseal, dialdroid, didfail, droidsafe, ", + "flowdroid, gator, ic3, ic3_fork, iccta, perfchecker and saaf. ", + "saaf starts at 95% in 2013, ic3 and ic3_fork at 75%, gator at 70%, ", + "dialdroid at 45%, flowdroid, the others between 10% and 0%. ", + "Gator finishin in 2023 at 90%, ic3 at 70%, flowdroid at 40%, ", + "perfchecker at 15%, the rest between 10% and 0%. ", + "Except for saaf and ic3_fork which drop between 2014 and 2016 for ", + "saaf and starting from 2018 for ic3_fork, the looks stables, with ", + "some increasing in between 2013 and 2018 then decreasing back to ", + "levels similar to the ones in 2013." + ).join() ), caption: [a) Java-based tools], supplement: none, @@ -269,7 +282,22 @@ We can observe that all Java-based tools have a finishing rate that decreases ov image( "figs/decorelation/finishing-rate-of-non-java-based-tool-by-discovery-year-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg", width: 50%, - alt: "", + alt: ( + "A graph showing the finishing rate (from 0 to 100) on the y-axe ", + "and the year the applications were first seen (from 2013 to ", + "2023) on the x-axes. There is a line for each of the following ", + "tools: adagio, amandroid, androguard, androguard_dad, apparecium, ", + "mallodroid, redexer and wognsen_et_al. ", + "androguard_dad starts at 50% in 2013, amandroid at 75%, adagio, ", + "androguard, apparecium, mallodroid and redexer start between 90% and ", + "100%, and wogsen_et_al starts at 3% in 2017. ", + "wogsen_et_al finishes at 5% in 2023, androguard_dad at 60%, redexer ", + "at 70%, apparecium at 90%, the other between 95% and 100%. ", + "androguard_dad drom from 50% in 2013 to 15% in 2015 up to 2017, ", + "then start rising to 60% in 2023. amandroid rise from 75% in 2013 to ", + "90% in 2015 then stay stable. All the other lines are stable, execpt ", + "for redexer that drop just at the end, in 2022-2023.", + ).join() ), caption: [b) Non-Java-based tools], supplement: none, @@ -284,13 +312,20 @@ We selected the sixth decile (between 4.08 and 5.20 MB), which is well represent We observe that 9 tools out of 12 have a finishing rate dropping below 20% for Java-based tools, which is not the case for non-Java-based tools. ] -#todo[Alt text for fig rasta-decorelation-min-sdk] #figure(stack(dir: ltr, [#figure( image( "figs/decorelation/finishing-rate-of-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg", width: 50%, - alt: "" + alt: ( + "A graph showing the finishing rate (from 0 to 100) on the y-axe ", + "and the min SDK of the applications (from 0 to 28) on the x-axes. ", + "There is a line for each of the following tools: anadroid, blueseal, ", + "dialdroid, didfail, droidsafe, flowdroid, gator, ic3, ic3_fork, ", + "iccta, perfchecker and saaf. They all start at 100% for SDK 0. After ", + "that, the figure become quite unreadable, the lines jump up and down, ", + "but it looks like in average the lines go down with min SDK. " + ).join() ), caption: [a) Java-based tools], kind: "sub-rasta-decorelation-size-decile-min-sdk", @@ -300,7 +335,15 @@ We observe that 9 tools out of 12 have a finishing rate dropping below 20% for J image( "figs/decorelation/finishing-rate-of-non-java-based-tool-by-min-sdk-of-apks-with-a-bytecode-size-between-4-08-mb-and-5-2-mb.svg", width: 50%, - alt: "", + alt: ( + "A graph showing the finishing rate (from 0 to 100) on the y-axe ", + "and the min SDK of the applications (from 0 to 28) on the x-axes. ", + "There is a line for each of the following tools: adagio, amandroid, ", + "androguard, androguard_dad, apparecium, mallodroid, redexer and wognsen_et_al.", + "Appart from androguard_dad that goes down then back up again, they ", + "all appear to average arround the same value all along, but with some ", + "noise, going up and down from one version to the next." + ).join() ), caption: [b) Non-Java-based tools], kind: "sub-rasta-decorelation-size-decile-min-sdk", diff --git a/5_theseus/3_static_transformation.typ b/5_theseus/3_static_transformation.typ index a85dc82..67cb2e4 100644 --- a/5_theseus/3_static_transformation.typ +++ b/5_theseus/3_static_transformation.typ @@ -270,7 +270,7 @@ We took special care to process the least possible files in the #APKs, and only Unfortunately, we did not have time to compare the robustness of our solution to existing tools like Apktool and Soot, but we did a quick performance comparison, summarised in @sec:th-lib-perf. In hindsight, we probably should have taken the time to find a way to use smali/backsmali (the backend of Apktool) as a library or use SootUp to do the instrumentation, but neither option has documentation to instrument applications this way. At the time of writing, the feature is still being developed, but in the future, Androguard might also become an option to modify #DEX files. -Nevertheless, we published our instrumentation library, Androscalpel, for anyone who wants to use it (see @sec:soft). #todo[Update is CS says no] +Nevertheless, we published our instrumentation library, Androscalpel, for anyone who wants to use it (see @sec:soft). #midskip diff --git a/5_theseus/5_results.typ b/5_theseus/5_results.typ index 9a2139b..5b7e0df 100644 --- a/5_theseus/5_results.typ +++ b/5_theseus/5_results.typ @@ -307,14 +307,19 @@ Although self-explanatory, verifying the code of those methods indeed confirms t caption: [Code of `Main.main()`, as shown by Jadx, after patching], ) -#todo[alt text for @fig:th-cg-before and @fig:th-cg-after] #figure([ #figure( render( read("figs/demo_main_main.dot"), width: 100%, alt: ( - "", + "A tree diagram. At the top, a node is labelled `Main->main()V`. ", + "Arrows goe from this node, down to four other nodes: ", + "`Main->decrypt(String)String`, `Method->invoke(Object [Object)Object`, ", + "`ClassLoader->loadClass(String)Class` and `Class->getMethod(String [Class)Method`. ", + "Arrows go down from `Main->decrypt(String)String` to 5 other nodes: ", + "Base64->decode(String I)[B`, `Cipher->init(I Key)V`, `Cipher->doFinal([B)[B`, ", + "`Cipher->getInstance(String)Cipher` and `String->([)V`." ).join(), ), caption: [Call Graph of `Main.main()` generated by Androguard before patching], @@ -325,7 +330,13 @@ Although self-explanatory, verifying the code of those methods indeed confirms t read("figs/patched_main_main.dot"), width: 100%, alt: ( - "", + "The same tree diagram as in the previous figure, but this time, they ", + "are 4 additionnal nodes under `Main->main()V`: ", + "`T->check_is_Malicious_send_data(Method)Z` and `T->check_is_Malicious_get_data(Method)Z`, ", + "both with a grey background, and `Malicious->send_data(String Activity)String` and ", + "`Malicious->get_data(String Activity)String`, both with a red background. ", + "An arrow goes from `Malicious->get_data` to a `Utils->sink(Activity String)V` ", + "node, and an arrow goes from `Malicious->get_data` to a `Utils->source(String)String` node." ).join(), ), caption: [Call Graph of `Main.main()` generated by Androguard after patching], diff --git a/X_appendices/french_summary.typ b/X_appendices/french_summary.typ index c85ba6a..c444c72 100644 --- a/X_appendices/french_summary.typ +++ b/X_appendices/french_summary.typ @@ -312,7 +312,23 @@ Par exemple, nous avons tracé l'évolution du taux de finition en fonction de l image( "../3_rasta/figs/finishing-rate-by-year-of-java-based-tools.svg", width: 90%, - alt: "" + alt: ( + "Graphe montrant le taux de finition en ordonées (de 0 à 100%) ", + "et l'année où les applications ont été découvertes pour la première ", + "fois en abscisse (de 2010 à 2023). ", + "Il y a une courbe pour chacun des outils suivant: anadroid, blueseal ", + "dialdroid, didfail, droidsafe, flowdroid, gator, ic3, ic3_fork, ", + "iccta, perfchecker, et saaf. ", + "saaf, ic3, ic3_fork et gator commencent en 2010 entre 95% et 100%. ", + "blueseal est autour de 90%, flowdroid 70%, didfail 60%, ", + "perfchecker, iccta et dialdroid entre 45% et 55%, ", + "gator 40%, et anadroid 15%. ", + "Ils chuttent tous au cour du temps, finissant autour de 75% pour ", + "gator, 60% pour ic4, 40% pour perfchecker et entre 0% et 20% pour ", + "les autres. ", + "On peut remarquer que saaf chutte soudainement entre 2014 et 2017, ", + "et ic3_fork commence a chutter après 2017." + ).join() ), caption: [Taux de finition des outils basé sur Java au cours des ans], ) @@ -399,28 +415,27 @@ Aussi, le code contenu fichier `classes100.dex` peut être utilisé par Android, Plus surprenant, de code contenu dans un fichier `classes1.dex` ou `classes02.dex` ne serra pas utilisé. Lors de l'analyse statique d'applications, ces deux points peuvent mener à des complications que nous allons maintenant explorer. -#todo[traduire en francais @lst:algo-cl] #figure( ```python - def get_mutli_dex_classses_dex_name(index: int): - if index == 0: + def obtenir_multi_dex_classes_nom_dex(indice: int): + if indice == 0: return "classes.dex" else: - return f"classes{index+1}.dex" + return f"classes{indice+1}.dex" - def load_class(class_name: str): - if is_platforn_class(class_nane): - return load_from_boot_class_loader(class_name) + def charge_classe(nom_classe: str): + if est_class_platforme(nom_classe): + return charge_depuis_chargeur_class_boot(nom_classe) else: - index = 0 - dex_file = get_nutli_dex_classses_dex_name(index) - while file_exists_in_apk(dex_file) and \ - not class_found in_dex_file(class_name, dex_file): - index += 1 - if file_exists_in apk(dex_file): - return load_from_file(dex_file, class_name) + indice = 0 + fichier_dex = obtenir_multi_dex_classes_nom_dex(indice) + while fichier_existe_dans_apk(fichier_dex) and \ + not classe_non_trouvee_dans_fichier_dex(nom_classe, fichier_dex): + indice += 1 + if fichier_existe_dans_apk(fichier_dex): + return charge_depuis_fichier(fichier_dex, nom_classe) else: - raise ClassNotFoundrror() + raise ErreurClasseNonTrouvee() ```, caption: [Algorithme de chargement de classe par défaut pour les applications Android], ) diff --git a/X_appendices/released_software.typ b/X_appendices/released_software.typ index 5c82a2a..fe2195e 100644 --- a/X_appendices/released_software.typ +++ b/X_appendices/released_software.typ @@ -46,8 +46,6 @@ The container images used to run the different tools are available on Zenodo at The list of applications we scanned in @sec:cl, as well as the lists of platform classes, fields and, methods we extracted from the emulators for Android #SDKs 32, 33, and 34, are stored on Zenodo at https://doi.org/10.5281/zenodo.15846481. -#jfl-note[Et le dataset utilsé pour évaluer les outils?] - == Theseus The scripts we used for dynamic analysis and the code implementing the transformations described in @sec:th are available at the following locations: