Master Thesis MSTR-2018-55

BibliographyLuo, Zheren: Master Annotator: enhancing distant supervision through visual interfaces.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 55 (2018).
65 pages, english.
Abstract

The rise of social networking sites and applications such as Twitter, Facebook, Instagram, Weibo in couple years provides new ways of information sharing on the Internet. Twitter is one of the most popular micro-blogging services. Recently, tweet classification has received much attention. In tweet classification, there is a demand for massive labeled training examples. However, the labeling cost can be high since the resources regarding available time, domain expert annotators are often limited. Distant supervision is a feasible solution but has some shortages such as noisy data, skewed annotations, and the challenge to generate negative training examples. This thesis tries to overcome these difficulties and shortages by combining visual analytics. Based on the distant supervision theorem, information such as keywords, hashtags, user_mentions, and URLs existing in tweets can be used as annotators to generate training examples. Through an interactive visual interface, the generated training examples can be checked and modified, which enhances the distant supervision. An iterative analytical loop is established using the interactive visual interface and the enhanced distant supervision. It allows users to make deep exploration of tweet data. Users can get tweets according to their needs, find potential topics, label or relabel tweets, train classifiers. A software is implemented to achieve the analytical loop. A case study is conducted to demonstrate the performance and usage of the approach and the software.

Full text and
other links
Volltext
Department(s)University of Stuttgart, Institute of Visualisation and Interactive Systems, Visualisation and Interactive Systems
Superviser(s)Ertl, Prof. Thomas; Thom, Dr. Dennis; Han, Qi
Entry dateJune 4, 2019
   Publ. Computer Science