Bibliography | Müller, Thomas: The Morphological Component of a Joint Morphological-Distributional Class Language Model. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Diploma Thesis No. 42 (2011). 70 pages, english.
|
Abstract | Modeling of out-of-vocabulary (OOV) words, i.e. words that do not occur in the training corpus but in the natural language processing (NLP) task at hand, is a challenging problem of statistical language modeling. We empirically investigate the relation between word context, word class and word morphology in English and present a class-based language model, which groups rare words of similar morphology together. The model improves the prediction of words after histories containing out-of-vocabulary words. The morphological features used are obtained without the use of labeled data, but produce a number of syntactically and even semantically related clusters. The overall perplexity improvement achieved by our model is 4% compared to a state of the art Kneser-Ney model and 81% on unknown histories. We conclude that the usage of morphological features in English language modeling is worthwhile.
|
Department(s) | University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
|
Superviser(s) | Mitschang, Prof. Bernhard; Schütze, Prof. Hinrich |
Entry date | May 4, 2020 |
---|