|Bräuninger, Maximilian: Improving SMT-based synonym extraction across word classes by distributional reranking of synonyms and hypernyms. |
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Bachelorarbeit (2017).
40 Seiten, englisch.
|CR-Klassif.||I.2.7 (Natural Language Processing)|
Automatic Synonym Extraction is a promising field of research. For example it can be useful in the creation of Thesauri, aswell as in the creation and examination of automatic machine translation. This thesis tries to extract synonym candidates using "statistical machine translation" (SMT) methods combined with multilingual parallel corpora. This is done by the creation of "word alignments" within the parallel corpus. Using these alignments, in a first step the German target words, consisting of nouns, verbs and adjectives, are translated into English pivots. Using the same techniques, these pivots are then re-translated into German words. These translations are regarded as synonym candidates and are ranked according to their "synonym probability". In a second step two different distributional semantics measures are introduced in order to re-rank the synonym candidates. The first measure tries to identify the semantical relation between the words, especially the hyperonomy, and rank hypernyms lower in the candidate list. The second measure relies on the semantical similarity of the words, ranking semantically equivalent words higher in the list. In a last step, the results are compared with regard to word class aswell as re-ranking strategy using a gold standard.
|PDF (539479 Bytes)|
|Abteilung(en)||Universität Stuttgart, Institut für Maschinelle Sprachverarbeitung|
|Betreuer||Schulte im Walde, Dr. Sabine; Di Marco, Marion|
|Eingabedatum||28. September 2018|