Masterarbeit MSTR-2024-06

Bibliograph.
Daten
Ma, Yingpeng: Analysing human vs. neural attention in VQA.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Masterarbeit Nr. 6 (2024).
55 Seiten, englisch.
Kurzfassung

Visual Question Answering (VQA) has drawn substantial interest in both academic and industrial research fields in recent years. Driven by Vision Transformers (ViT) and the vision-text co-attention mechanism, these models have shown notable performance improvement. Yet, the black-box nature of neural attention impedes people from understanding its functionality and establishing their trustworthiness. Drawing inspiration from various scholars and their contributions, this thesis demystifies these mechanisms. We aim to 1) extract the neural attention weights of VQA models, 2) remap the weights to machine attention maps, 3) compare machine attention with human gazing heatmaps, and 4) compute the related metrics to provide deeper insights into the attention patterns. First, the attempts to reproduce the MCAN model implementation and machine attention extraction on the VQA-MHUG dataset are performed on the MULAN framework. Through a comparison with official implementations, the accuracy and correctness of the re implementation have been verified. Then, utilizing the toolkit of the MULAN framework, the 1D attention weights are remapped to 2D neural attention maps. Next, these attention maps are compared to human-gazing heatmaps of VQA-MHUG using explainable AI (XAI) metrics. Following the above pipeline, another experiment on the AiR-D dataset is conducted and reports the Area Under ROC Curve (AUC), Spearman’s rank correlation coefficient (rho), and Jensen-Shannon Divergence (jsd) metrics to compare the neural attention with the human gazing heatmaps. Finally, the discussion of the differences between the official and re-produced implementations is presented alongside insights on the interpretability of neural attention in VQA models.

Volltext und
andere Links
Volltext
Abteilung(en)Universität Stuttgart, Institut für Visualisierung und Interaktive Systeme, Visualisierung und Interaktive Systeme
BetreuerBulling, Prof. Andreas; Wang, Yao; Hindennach, Susanne
Eingabedatum21. Mai 2024
Neuer Report   Neuer Artikel   Neues Sammelwerk   Abteilung   Institut   Informatik