Bachelor Thesis BCLR-2021-110

BibliographySchmid, Philipp: BioNer: Named Entity Recognition in the Biomedical Domain.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Bachelor Thesis No. 110 (2021).
67 pages, english.
Abstract

Named Entity Recognition (NER) builds a critical basis for many Natural Language Processing (NLP) tasks. In the biomedical domain, the NER task is difficult because the language used is idiosyncratic and new words for diseases or drugs are coined on a daily basis. Most state-of-the-art models for NER are BERT-based models, which are computationally expensive to pre-train. In this thesis, we propose a Long Short Term Memory (LSTM)-based model called BioNER. BioNER is a system build for biomedical NER that takes as input a sentence and predicts text spans corresponding to entities, which can then be used by other systems for further NLP tasks such as entity linking. BioNER is the extension of the existing LSTM-based model, DATEXIS-NER. The main enhancement brought about by BioNER is the use of word embeddings pre-trained with fastText on a large collection of biomedical text and a customized architecture. We compare BioNER with DATEXIS-NER, as well as with the two BERT-based models BioBERT and SciBERT on the MedMentions and JNLPBA datasets. We show that word embeddings pre-trained on a large text collection can improve an existing model. Furthermore, we show that with BioNER it is possible to have a model for biomedical NER that is less computationally expensive to train than the BERT-based models while at the same time offering competitive prediction capabilities.

Department(s)University of Stuttgart, Institute of Artificial Intelligence, Analytic Computing
Superviser(s)Staab, Prof. Steffen; Dima, Dr. Corina
Entry dateNovember 11, 2024
   Publ. Computer Science