Masterarbeit MSTR-2024-92

Bibliograph.
Daten
Bickici, Deniz: Multi-modal Graphormer for Action Recognition in Egocentric Videos.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Masterarbeit Nr. 92 (2024).
87 Seiten, englisch.
Kurzfassung

The recognition of actions in egocentric videos is an increasingly important topic due to the continuous rise of wearable augmented reality devices. Most current methods primarily focus on single-modality approaches, such as using RGB images with vision models. However, these approaches often lack relational information, which can be critical for understanding action scenes. Modern approaches incorporate multi-modal data, such as audio or gaze, but often leverage all modalities to predict the action classes directly. This work takes a different approach, instead of predicting actions as a whole, we split the task into sub-tasks by separately predicting verbs and nouns. Our method, selectively employs modalities in contexts where they are most effective. Therefore, we propose a hierarchical multi-modal action recognition model that effectively combines diverse visual modalities including hand-object interactions, gaze data, scene semantics, motion dynamics, and RGB images. The model incorporates transformer-based graph and vision models to effectively integrate visual and relational information. This design allows the model to capture the distinct contribution of each modality to identify the correct action. While the proposed model did not achieve state-of-the-art accuracy, experimental results demonstrate its effectiveness in integrating multi-modal information hierarchically and its potential for improving action recognition. This research highlights the promise of graph-based architectures in multi-modal learning and lays a foundation for more holistic modality integration and more efficient action recognition systems.

Abteilung(en)Universität Stuttgart, Institut für Visualisierung und Interaktive Systeme, Visualisierung und Interaktive Systeme
BetreuerBulling, Prof. Andreas; Shi, Dr. Lei
Eingabedatum13. März 2025
   Publ. Informatik