Diploma Thesis DIP-3464

BibliographyZou, Fan: Internetgestützte Textanalyse zur Extraktion von Produktentwicklungswissen anhand von semi-strukturierten Dokumenten.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Diploma Thesis No. 3464 (2013).
85 pages, german.
CR-SchemaI.2.7 (Natural Language Processing)
I.5.2 (Pattern Recognition Design Methodology)
J.6 (Computer-Aided Engineering)
Abstract

Abstract

With the popularization and development of internet in the past few decades, more and more electronic documents appear on the Internet. Numerous product specifications are available via Internet, eg available in the form of web pages or PDFs. This dissertation helps the company to automatically extract the products, product sepecifications and product restriction from the web site. In this paper, We research on the definition of product named entity, the construction of the corpus, and the recognition technologies. This work concerns the following aspects:

1. After studying many of product names in web pages, we define the various compositi- ons of product name entity. With this definition, we developed a rule for the corpus annotation. Then we create a product named entity corpus by using the semi-supervised method.

2. According to the features of the product names we divided the recognition of product names into two phases. The first phase detects the brand name, the series name and the type of a product. Based on the first results the product name will be recognised in the second phase. For the recognition in these two phases, many methods can be used. In this work we discuss hidden Markov model, maximum entropy model and Conditional Random Field model. After comparing these three models we decide to use conditional Random Field Model to do the recognition.

3. After the product names are successfully detected, the products, the product features and the restrictions between products will be extracted.

Full text and
other links
PDF (3038613 Bytes)
Department(s)University of Stuttgart, Institute of Computer-aided Product Development Systems, Computer-aided Product Development Systems
Superviser(s)Julian Eichhoff
Entry dateNovember 11, 2013
   Publ. Institute   Publ. Computer Science