thesis/3_rasta/1_intro.typ

#import "../lib.typ": etal, jfl-note, jm-note
#import "X_var.typ": *

== Introduction

In this chapter, we study the reusability of open source static analysis tools that appeared between 2011 and 2017, on a recent Android dataset.
The scope of our study is *not* to quantify if the output results are accurate to ensure reproducibility, because all the studied static analysis tools have different goals in the end.
On the contrary, we take the hypothesis that the provided tools compute the intended result, but may crash or fail to compute a result due to the evolution of the internals of an Android application, raising unexpected bugs during an analysis.
This chapter intends to show that sharing the software artefacts of a paper may not be sufficient to ensure that the provided software will be reusable.

Thus, our contributions are the following.
We carefully retrieved static analysis tools for Android applications that were selected by Li #etal~@Li2017 between 2011 and 2017.
We contacted the authors whenever possible to select the best candidate versions and to confirm the good usage of the tools.
We rebuild the tools in their original environment and share our Docker images.#footnote[on Docker Hub as `histausse/rasta-<toolname>:icsr2024`]
We evaluated the reusability of the tools by measuring the number of successful analyses of applications taken in the Drebin dataset~@Arp2014 and in a custom dataset that contains more recent applications (#NBTOTALSTRING in total).
The observation of the success or failure of these analyses enables us to answer the following research questions:

/ RQ1: Which Android static analysis tools that are more than 5 years old are still available and can be reused without crashing with a reasonable effort? <rq-1>
/ RQ2: How has the reusability of tools evolved over time, especially when analysing applications that are more than 5 years away from the publication of the tool? <rq-2>
/ RQ3: Does the reusability of tools change when analysing goodware compared to malware? <rq-3>

/*
As a summary, the contributions of this chapterare the following:

- We provide containers with a compiled version of all studied analysis tools, which ensures the reproducibility of our experiments and an easy way to analyse applications for other researchers. Additionally receipts for rebuilding such containers are provided.
- We provide a recent dataset of #NBTOTALSTRING applications balanced over the time interval 2010-2023.
- We point out which static analysis tools of Li #etal SLR~@Li2017 can safely be used and we show that #resultunusable of evaluated tools are unusable (considering that a tool that fails more than 50% of time is unusable). In total, the success rate of the tools we could run is #resultratio on our dataset.
- We discuss the effect of applications features (date, size, SDK version, goodware/malware) on static analysis tools and the nature of the issues we found by studying statistics on the errors captured during our experiments.
*/

The chapter is structured as follows.
@sec:rasta-methodology presents the methodology employed to build our evaluation process, and @sec:rasta-xp gives the associated experimental results.
@sec:rasta-failure-analysis investigates the reasons behind the observed failures of some of the tools.
We then compare in @sec:rasta-soa-comp our results with the contributions presented in @sec:bg.
In @sec:rasta-reco, we give recommendations for tool development that we drew from our experience running our experiment.
Finally, @sec:rasta-limit lists the limit of our approach, @sec:rasta-futur presents further avenues that did not have time to pursue and @sec:rasta-conclusion concludes the chapter.