Bachelor Thesis BCLR-2017-16

BibliographyBräuninger, Maximilian: Improving SMT-based synonym extraction across word classes by distributional reranking of synonyms and hypernyms.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Bachelor Thesis (2017).
40 pages, english.
CR-SchemaI.2.7 (Natural Language Processing)
Abstract

Automatic Synonym Extraction is a promising field of research. For example it can be useful in the creation of Thesauri, aswell as in the creation and examination of automatic machine translation. This thesis tries to extract synonym candidates using "statistical machine translation" (SMT) methods combined with multilingual parallel corpora. This is done by the creation of "word alignments" within the parallel corpus. Using these alignments, in a first step the German target words, consisting of nouns, verbs and adjectives, are translated into English pivots. Using the same techniques, these pivots are then re-translated into German words. These translations are regarded as synonym candidates and are ranked according to their "synonym probability". In a second step two different distributional semantics measures are introduced in order to re-rank the synonym candidates. The first measure tries to identify the semantical relation between the words, especially the hyperonomy, and rank hypernyms lower in the candidate list. The second measure relies on the semantical similarity of the words, ranking semantically equivalent words higher in the list. In a last step, the results are compared with regard to word class aswell as re-ranking strategy using a gold standard.

Full text and
other links
PDF (539479 Bytes)
Department(s)University of Stuttgart, Institute for Natural Language Processing
Superviser(s)Schulte im Walde, Dr. Sabine; Di Marco, Marion
Entry dateSeptember 28, 2018
   Publ. Computer Science