Master Thesis MSTR-2023-97

Bibliography	Xu, Zhenhao: Contrastive representation learning for eye contact detection. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 97 (2023). 75 pages, english.
Abstract	While extensive research has been dedicated to gaze estimation, featuring numerous methods and datasets, eye contact detection, despite receiving comparatively less attention and marked by a scarcity of datasets, still holds significant practical applications. For instance, in remote learning scenarios, eye contact detection can be employed to ascertain whether students are focusing their attention on the screen. This technology can be instrumental in enhancing virtual engagement and educational efficacy. Moreover, the challenge in generalizing between the datasets of gaze estimation and eye contact detection, mainly due to their differing labeling approaches, poses a significant challenge. These challenges, particularly the scarcity of dedicated datasets and the difficulty in direct application of gaze estimation methods to eye contact detection, necessitate a novel approach. In response to these issues, this thesis introduces a novel approach to model construction for eye contact detection, employing an unsupervised contrastive learning method. This method was chosen for its ability to utilize large amounts of unlabeled data from gaze estimation datasets, particularly advantageous given the scarcity of dedicated eye contact detection datasets. In our study, we employed the SimCLR contrastive learning model, optimized specifically for eye contact detection. This optimization led to a significant improvement in the Matthews Correlation Coefficient (MCC) for eye contact detection, elevating it from 0.46, as achieved by Zhang et al.’s method, to 0.63 with our approach. Notably, our method achieves this enhanced performance without the need for datasets manually labeled with gaze direction or eye contact labels. This marks the pioneering application of contrastive learning to the task of eye contact detection, showcasing its efficacy in improving key performance metrics. Additionally, in the fine-tuning process of our contrastive learning model, while there was still a requirement for a small dataset labeled with eye contact detection, we sought to completely eliminate the dependency on manually annotated eye contact labels. To achieve this, we utilized state-of-the-art gaze estimation models, not as the primary method, but as an auxiliary tool to automatically generate pseudo-labels for eye contact detection. This strategy effectively leverages the outputs of the gaze estimation models to produce reliable pseudo-labels, allowing our eye contact detection model to operate independently of manual labeling.
Full text and other links	Volltext
Department(s)	University of Stuttgart, Institute of Visualisation and Interactive Systems, Visualisation and Interactive Systems
Superviser(s)	Bulling, Prof. Andreas; Bace, Dr. Mihai
Entry date	April 8, 2024

Publ. Computer Science