Bachelorarbeit BCLR-2024-59

Bibliograph.
Daten
Feng, Xiwen: Survey and Evaluation of Explainable AI Methods for VQA Models that Learn from Disagreement.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Bachelorarbeit Nr. 59 (2024).
36 Seiten, englisch.
Kurzfassung

Visual Question Answering (VQA) systems traditionally focus on identifying a single ”correct” answer for each image-question pair, neglecting the diversity of human interpretations. This study explores the phenomenon of disagreement in VQA, defined as variations in annotator-provided answers, and proposes a binary classification model to predict answer agreement. Using the VQA-MHUG dataset, visual features extracted with CLIP and textual features derived from DistilBERT are combined to train a Random Forest classifier. The model achieves a classification accuracy of 72.81%, demonstrating robust performance in identifying complete agreement but facing challenges in handling partial agreement cases. Additionally, a Local Interpretable Model-Agnostic Explanations (LIME)-based XAI framework is employed to interpret the model’s predictions, providing insights into the significance of image regions and question components. This work highlights the complexity of handling VQA disagreements and underscores the need for further advancements in feature integration and explanation methods to better capture human-centric variations in visual and textual understanding.

Abteilung(en)Universität Stuttgart, Institut für Visualisierung und Interaktive Systeme, Visualisierung und Interaktive Systeme
BetreuerBulling, Prof. Andreas; Hindennach, Susanne
Eingabedatum21. Februar 2025
   Publ. Institut   Publ. Informatik