Masterarbeit MSTR-2021-38

Bibliograph.
Daten
Reich, Kevin: Optimizing neural fake speech detection using post hoc analysis.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Masterarbeit Nr. 38 (2021).
76 Seiten, englisch.
Kurzfassung

With the technological advance in speech synthesis methods, it has become apparent that attackers can abuse this technology to launch fake speech attacks in a number of ways: faking the voice of a supervisor to order an employer to make money transfers, spreading fake news and propaganda or spoofing automatic speaker verification (ASV) systems. Thus, it has become important to detect whether speech is genuine or artificially created. A small scale study contained in this thesis indicates that humans do not solve this problem trivially and therefore will need the help of automatic counter measure (CM) systems. The most successful automatic approaches use neural networks to solve the problem. In our work, we analyzed the decision making process of neural CM systems and used that insight to improve the performance of the best network we observed. Our work was done on the ASVspoof 2019 dataset as it was the only popular fake speech dataset in use when we started our work. First, we showed that using spectrogram images as input is a legitimate way to solve the task of fake speech detection. This allowed us to use image classification models and the post hoc analysis method Score-CAM. Among the multiple image classification models we tested, EfficientNet-B3 achieved the best scores. Our post hoc analysis for the EfficientNet revealed that it uses background noise and features in the lower frequencies to distinguish between real and fake speech samples. We used that insight in two follow-up experiments to improve the models performance by 28.7% and 30.25% respectively. The model from the second follow up experiment is the fifth best non-ensemble model for the ASVspoof 2019 LA dataset up to date. This highlights the importance of understanding what neural networks are actually doing since it can be used to optimize their performance significantly.

Volltext und
andere Links
Volltext
Abteilung(en)Universität Stuttgart, Institut für Maschinelle Sprachverarbeitung
BetreuerVu, Prof. Ngoc Thang; Tilli, Pascal
Eingabedatum4. November 2021
Neuer Report   Neuer Artikel   Neues Sammelwerk   Informatik