Bachelorarbeit BCLR-2020-36

Hilbig, Aaron: A Benchmark of WebAssembly Programs.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Bachelorarbeit Nr. 36 (2020).
71 Seiten, englisch.

WebAssembly is a new low-level byte code for the web, offering an extension to JavaScript for compute-intensive applications that require near-native performance. Announced in 2015, Web- Assembly is already supported in all modern browsers and will become an important web technology in the future. As WebAssembly is maturing, two issues become more pressing to be addressed: Firstly, Web- Assemblyā€™s current uses in the ā€¯real worldā€¯ wide web have not yet been thoroughly examined. A previous study by Musch et al. was conducted when WebAssembly was still in the early stages of its deployment, and is not representative of WebAssembly usage today. Knowing what language features, frameworks, or programming languages are commonly used, and what the popular use-cases for WebAssembly are can be important for evolving the language and the surrounding ecosystem. Secondly, no good benchmarking set of WebAssembly programs is publicly available. Such a set may benefit WebAssembly tooling developers and researchers alike. Developers can use it to benchmark and test their applications with realistic data. Researchers, who previously relied on desktop applications compiled to WebAssembly or on a few manually selected WebAssembly applications, can use it as training data for machine-learning based research, to test static analysis tools, or analyze the binaries, e.g. to search for vulnerabilities. We aim to help both these causes by gathering a large set of WebAssembly binaries and analyzing the collected data. We collect WebAssembly modules using a specially designed web crawler and multiple other methods and sources, including the crawling of the top one million websites with varying depth, the querying of data provided by HTTP Archive, querying of a package manager, and more. For the analysis of the collected files, we implement a static analysis tool to analyze instruction and extension usage, along with heuristic methods to infer used programming languages, compilers and frameworks. We validate our toolā€™s results and further investigate the use cases for WebAssembly by manually examining a random sample. We collect 3431 WebAssembly modules, 709 of which are unique, which is 4.7 times as many as reported in previous work. This dataset is openly available. We find that WebAssembly is used for a wide range of use cases, most of which are benign, for example as part of games, libraries and a diverse set of custom applications. In particular, we find only very few modules used for the malicious practice of cryptojacking. While Musch et al. report 32.0% of their dataset to be comprised of cryptominers, we only identify 4 suspicious files (<0.5%). We further find the WebAssembly ecosystem to be diverse, with many different source languages and compilers targeting WebAssembly.

Abteilung(en)Universität Stuttgart, Institut für Softwaretechnologie, Programmiersprachen und Übersetzerbau
BetreuerPradel, Prof. Michael; Lehmann, Daniel
Eingabedatum12. November 2020
   Publ. Informatik