Masterarbeit MSTR-2008-12

Bibliograph.
Daten
Ortiz, Maria Mera: Correlation Measures for Text Analysis Results.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Masterarbeit Nr. 12 (2008).
112 Seiten, englisch.
Kurzfassung

Hamessing unstructured, textual data is becoming more and more important for scenarios such as quality early warning, company reputation management or proactive customer churn detection. Text analysis technologies such as Information Extraction can be used to extract information out of email, call center transcripts, technician comments or customer forums, which can then be used within business intelligence applications.

In this thesis, an approach is presented and implemented that combines text analysis with association rule mining, in the context of a quality early warning scenario. The text analysis approach is focused on extracting syntactical entities like noun phrases, which requires less manual effort than domain-tailored text analysis. Subsequent association rule mining is used to detect the relevant entities found during text analysis, and relate them to structured information, for example, certain car models. As part of the thesis, different correlation measures are evaluated which gauge the interestingness of the association rules.

The quality early warning scenario is based on over 500000 publicly available vehicle complaints from the US National Highway Traffic Safety Administration. The scenario shows how IBM InfoSphere Waherouse can extract relevant information from complaint descriptions about car models, which allows business analysts to investigate potential causes of problems for certain vehicles. In this example, the Quality Early Waming application is built as a set of reports within IBM Cognos 8 BI server. As the results of both text analysis and subsequent data mining are stored in relational tables, the application could be implemented in any BI tool, or custom application.

To evaluate the quality of the approach, the thesis contains a benchmark assessment against IBM Content Analyzer, an IBM product that attempts to cover the same usage sce­narios than the approach taken in this hesis.

Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware
BetreuerMitschang, Prof. Bernhard; Lang, Alexander
Eingabedatum20. April 2023
   Publ. Informatik