Bachelorarbeit BCLR-2023-83

Bibliograph.
Daten
Marx Larre, Miguel: Effects of paraphrasing and demographic metadata on NLI classification performance.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Bachelorarbeit Nr. 83 (2023).
83 Seiten, englisch.
Kurzfassung

Native language identification (NLI) refers to the task of automatically deducing the native language (L1) of a document's author, when the document is written in a second language (L2). Documents stem from different sources, but recently more documents are altered before publication through paraphrasing methods. This alteration changes the content, grammar, and style of the document, which inherently obfuscates the L1 of the author. In addition, the demographic metadata of the author, such as age and gender, may influence the performance with which an author's L1 may be detected. In this thesis, two corpora which provide necessary demographic metadata, the International Corpus of Learner English (ICLE) and the \textsc{Trustpilot} corpus, are used to analyze the impact of paraphrasing and demographic factors in the context of NLI tasks. To analyze the effect of paraphrasing on a document, new versions of both corpora are created, which contain paraphrased versions of the documents contained. The effect is inspected using two state-of-the-art NLI systems to perform the task, while the results were analyzed using a regression analysis in combination with dominance analysis (DA). Paraphrasing was found to have a substantial influence in performance of NLI tasks, regardless of corpus, classifier, or paraphrasing method. The usual influence of demographic factors on NLI tasks could not be confirmed in this thesis. Regression analysis and DA allowed for a more profound analysis of the results, which allowed for findings regarding the influence of specific L1s on performance of NLI tasks.

Volltext und
andere Links
Volltext
Abteilung(en)Universität Stuttgart, Institut für Maschinelle Sprachverarbeitung
BetreuerPadó, Prof. Sebastian
Eingabedatum5. April 2024
   Publ. Informatik