Master Thesis MSTR-2018-120

BibliographyBalla, Irdi: Visual question answering for intuitive human-robot collaborations using compositional neural networks.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 120 (2018).
45 pages, english.

Visual and language cues are very useful for understanding the environment around us. For AI it is also imperative to be able to use text and images to reason about the situation that they are in. Building conversational AI that can share a common linguistic and perceptual understanding with us is both highly desirable and challenging. Recent advances in deep learning and computer vision provide a host of effective neural machines capable of solving end-to-end visual question-answering tasks. In this thesis we aim to explore, extend, and adapt these methods to the domain of intuitive human-robot collaborations. Such domains require robots, through natural language interface, to smoothly collaborate and assist humans in vision-based manipulation tasks. They are thus inherently symbolic, relational, and compositional. We therefore plan to investigate methods capable of i) interpreting and grounding natural language queries, in particular referential expressions, in visual descriptions, and ii) composing neural machines to answer compositional questions and understand commands in the domain of interest. To evaluate this work we will use the Total Difficulty Test.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Machine Learning und Robotics
Superviser(s)Hennes, Ph.D. Daniel; Ngo, Hung
Entry dateFebruary 15, 2022
   Publ. Computer Science