Master Thesis MSTR-2024-141

BibliographyMendonca, Daisy Sheetal: Enhancing User Interaction in Autonomous Vehicles through Voice-Driven Generative AI.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 141 (2024).
79 pages, english.
Abstract

Abstract The introduction of automated vehicles has opened the possibilities for hybrid working from car. A vehicle running on automation level 3 or 4, also known as parti- ally autonomous vehicle requires minimal human intervention while driving. Hence, the commute time could be used for mobile working. As these vehicles will be integrated with advanced technologies, the cockpit could be used as digital workplace to perform productivity tasks. However, considering the fact that human intervention is needed during emergency situations, there are some challenges associated with providing seamless mobile working experience for the users. As partially autonomous vehicles require some human intervention, it is not possible to perform productivity tasks like reading reports, lengthy documents. This thesis strives to address these issues of mobile working with the help of a generative AI application and provides solution to enhance the overall mobile working experience. The application is equipped to take voice commands from the users providing a hands-free option to fetch any relevant information from the intended documents. The application scans through long documents and huge corpus of texts and provides the users with quick summaries and can also answer any questions based on the uploaded documents. The risk factors involved while opening a confidential document in the presence of unauthorized passengers are also handled by raising alerts ensuring that the data privacy is not compromised. The pre-trained checkpoint is evaluated using BERTScore and based on the result of evaluation, Facebook\'92s BART was chosen as the suitable model for text handling tasks. To further evaluate the performance of the model, systematic human evaluation is carried out and the inter-evaluator agreement is calculated in terms of percentage agreement. The application was able to achieve a precision of 0.91, recall 0.88 and F1 0.89 while evaluated on generated summaries. The answers generated by the AI application for the given documents and questions showed a value of 0.93, 0.92 and 0.93 for precision, recall and F1 respectively. In addition, a Word Error Rate (WER) of 9.08% was achieved for automatic speech recognition. These data points suggest that quality of summaries and answers generated are satisfactory but the voice-to-text conversion model has scope for improvement.

Department(s)University of Stuttgart, Institute for Natural Language Processing
Superviser(s)Vu, Prof. Thang; Mallick, Arijit
Entry dateDecember 19, 2025
New Report   New Article   New Monograph   Computer Science