Master Thesis MSTR-2021-108

BibliographySatkunarajan, Jena: Visual analysis of news stories using neural language models.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 108 (2021).
115 pages, english.
Abstract

With the introduction of computers of varying sizes in the everyday life of the majority of the world population, we have seen a rapid increase in the amount of textual contents produced and distributed across the digitized globe. Among the insurmountable amounts of text found in the internet, news articles are of particular interest to many journalists, scientists and any other groups interested in the events captivating the public interest. As hundreds and thousands of online news providers report about the important and less important topics, it becomes an almost impossible challenge to gather and collect the valuable knowledge provided by all these sources. To gain an overview over the general happenings or to learn about specific topics thus becomes the task of identifying the novel information among an ocean of recurring, duplicate and rewritten stories. This thesis presents a combined approach to interactively visualise the novel content and the evolution of topics in news story corpora. A prototype framework is developed that utilises the GPT-2 transformer neural network based language model to assess the novelty of textual contents. Building on the resulting novelty scores, the textual contents of articles are visually highlighted to emphasise the novelty of the content. The novel article content is presented in multiple views, providing increasing levels of aggregation as the underlying article data grows in size. Employing a term weighting scheme incorporating the novelty scores, the ensuing document vectors are utilised to model the topics of the article corpus over time. The resulting, time-dependant topic clusters are presented in a multi-layered visualisation approach, providing multiple perspectives on the evolution of topics over time. The different visualisations and functionalities are combined into an interactive framework with multiple, coordinated views.

Full text and
other links
Volltext
Department(s)University of Stuttgart, Institute of Visualisation and Interactive Systems, Visualisation and Interactive Systems
Superviser(s)Ertl, Prof. Thomas; Knittel, Johannes
Entry dateOctober 28, 2022
   Publ. Computer Science