Master Thesis MSTR-2020-73

Bibliography	Vetter, Jan: Analyzing the Effects of Elastic Weight Consolidation on Continual Learning in the Domain of Visual Question Answering. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 73 (2020). 105 pages, english.
Abstract	Abstract The basis for deep learning, artificial neural networks, suffer from catastrophic forgetting which occurs when a network is trained consecutively on various tasks. They will gain new knowledge but on the cost of forgetting previously learned tasks, in the worst case, completely. There exist various approaches that try to find a trade-off between acquiring new knowledge while reducing catastrophic forgetting on existing knowledge. These approaches are usually created for and tested on toy task databases, only. This thesis will analyze the impact of using one of these approaches, the regularization-based continual learning approach Elastic Weight Consolidation (EWC), in a more complex and near to realistic task such as visual question answering (VQA). A network designed for VQA needs to provide a precise real world answer to a given input question by taking a corresponding given input image into account. Further, this thesis studies required preconditions for a successful application of EWC and the possibility for making predictions in a given VQA scenario. The analyzed continual learning scenarios lead to the conclusion that EWC is able to remember previously learned knowledge while at the same time gaining new knowledge in a complex and near to realistic task such as VQA. By making use of EWC in certain scenarios, a network is able to boost itā€™s performance to an accuracy which is near to the one achieved for traditional training. EWC performs better when learning new instances of already seen classes than when a network is trained on new classes in every new task. All performance-boosted scenarios have in common that the size of the training database is of similar or larger size as the database the network was originally created for when training in traditional manner. This thesis proves that EWC is able to boost performance in a complex and near to realistic real-world scenario like VQA. The analyzed approach helps a network to remember previously learned knowledge while at the same time gaining new knowledge resulting in a final performance which is near to ideal when trained on new instances. Nevertheless, EWC is able to boost a networkā€™s performance remarkably for incremental class learning but catastrophic forgetting is still present. If a network is trained on a database having the same or a larger size like the database it was originally created for, EWC is able to boost performance ā€“ the more training data is available, the better EWC is able to increase the final network performance.
Department(s)	University of Stuttgart, Institute for Natural Language Processing
Superviser(s)	Vu, Prof. Thang; Väth, Dirk
Entry date	April 22, 2021

Publ. Computer Science