Master Thesis MSTR-2022-60

BibliographyOtt, Stefan: Long-tailed visual object detection using grouped staged training.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 60 (2022).
103 pages, english.

State-of-the-art object detectors are dominated by data-driven methods, which rely on data quantity and quality for training. However, the usage of data collected from the real-world can exhibit a long-tail data distribution, where object detectors tend to be overconfident on head classes while generalizing poorly to tail classes. In this thesis, techniques from few-shot learning are transferred to long-tailed learning by applying and extending fine-tuning for object detection. We show the effectiveness of two-stage fine-tuning on a challenging long-tail large vocabulary dataset and extend the first stage of training by including tail class data. We propose grouped staged training as extension of the first representation learning stage to reduce the overall class imbalance and exploit similarities between classes. Our experiments show that grouped staged training can increase the overall performance by improving detection and classification of rare classes, while keeping the performance on frequent and common objects. Additionally, we apply self-supervised pre-training to long-tailed object detection and show the positive effect of its combination with two-staged training. Finally, we critically evaluate cross-category rankings and limitations of our methods regarding detection confidence scores, where we observe the performance improvements are redistributed towards tail classes with a decreasing number of predictions for head classes.

Department(s)University of Stuttgart, Institute for Natural Language Processing
Superviser(s)Vu, Prof. Ngoc Thang; Schweitzer, Dr. Antje; Zhang; Dr. Dan; Friedrich, Dr. Annemarie
Entry dateNovember 29, 2022
   Publ. Computer Science