Bachelorarbeit BCLR-2018-54

Milovanovic, Milan: Investigating different levels of joining entity and relation classification.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Bachelorarbeit Nr. 54 (2018).
76 Seiten, englisch.

Named entities, such as persons or locations, are crucial bearers of information within an unstructured text. Recognition and classification of these (named) entities is an essential part of information extraction. Relation classification, the process of categorizing semantic relations between two entities within a text, is another task closely linked to named entities. Those two tasks -- entity and relation classification -- have been commonly treated as a pipeline of two separate models. While this separation simplifies the problem, it also disregards underlying dependencies and connections between the two subtasks. As a consequence, merging both subtasks into one joint model for entity and relation classification is the next logical step. A thorough investigation and comparison of different levels of joining the two tasks is the goal of this thesis. This thesis will accomplish the objective by defining different levels of joint entity and relation classification and developing (implementing and evaluating) and analyzing machine learning models for each level. The levels which will be investigated are: (L1) a pipeline of independent models for entity classification and relation classification (L2) using the entity class predictions as features for relation classification (L3) global features for both entity and relation classification (L4) explicit utilization of a single joint model for entity and relation classification The best results are achieved using the model for level 3 with an F1 score of 0.830 for entity classification and an F_1 score of 0.52 for relation classification.

Volltext und
andere Links
Abteilung(en)Universität Stuttgart, Institut für Maschinelle Sprachverarbeitung
BetreuerPadó, Prof. Sebastian, Klinger, Dr. Roman; Adel, Heike
Eingabedatum8. Januar 2019
   Publ. Informatik