Master Thesis MSTR-2018-126

BibliographyCozma, Adriana-Eliza: Understanding deep neural networks' behavior via local investigation along paths.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 126 (2018).
98 pages, english.

Deep Neural Networks (DNNs) define highly expressive models that have recently achieved state of the art performance, equal or even beyond human capabilities, on various tasks. They are even deployed in real-world applications, such as surveillance systems, autonomous driving, etc., where failure involves a larger risk. The depth characterizing such networks is closely related to their success, but it also leads to complex models that are hard to understand. The inherent structure of DNNs is the immediate result of combining together many non-linear parts. DNNs are often perceived as black-boxes, their decision process being opaque to humans. Shedding light upon these networks and making them more transparent have recently become active research areas in the deep learning field. New concepts have emerged within this context, such as understanding, explainability and interpretability. This thesis aims to gather explanatory insights regarding how DNNs behave internally. This understanding objective is pursued by performing a local investigation along paths. These paths are defined in input space, but tracked across all layers of the DNN. They either define standard, intuitive image operations, such as blur, brightness, etc., or describe linear interpolations between two points (images) in input space, e.g. connecting a clean image and its adversarial counterpart. Each considered path is discretized and the sampled points are passed as a batch to DNN classifiers. The sampling strategy is known and implies intuitive properties of the path’s structure (e.g. the sampled points are evenly distributed along a path). The investigation continues in feature space, where the path’s intrinsic structure gets modified layer by layer. We propose metrics to evaluate how a path gets twisted and rescaled layer-wise.This investigation is conducted in different experimental scenarios. The most interesting one in terms of gained insights explores how paths starting from clean images and from adversarial examples, respectively behave. The results showed that these paths act differently. We build upon this discovery to create a binary classifier that distinguishes between clean or noisy inputs and adversarial inputs. The mean value of the Area Under the Curve (AUC) performance measure over three attacks is 0.862. According to related literature, this value corresponds to a ”good” classifier. Further tackling adversarial attacks, we design a method that hardens DNNs against adversarial examples. This technique consists of adding a blur operation before a model that is trained to be robust under blur. Such a preprocessing step blurs all inputs before they are passed to the DNN. The adversarial perturbation is destroyed for 83.92% (mean across three considered attacks) of the inputs.

Full text and
other links
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Machine Learning und Robotics
Superviser(s)Hennes, Ph.D. Daniel; Fischer, Dr. Volker
Entry dateApril 6, 2022
   Publ. Computer Science