Since in the last two years good progress was made in work on part-of-speech tagging for German achieving an accuracy of more than 96% of correct annotation, it should be thought about using a tagger as an a priori disambiguating tool for the syntactical analysis.
The paper describes an examination of the influence of tagging on subsequent partial parsing in german texts with respect to the number of analysis results. The underlying question is, to what extent a tagger can serve as a preprocessor for the partial syntactical analysis to reduce a priori the number of ambiguities in the parse results.
Two german texts from a newspaper were taken, one of them being hand-tagged (1097 sentences), the other being statistically tagged with the Xerox-tagger adapted to German (3776 sentences). Both texts were partially syntactically analyzed, each in the tagged and the untagged--i.e. fully morphologically analyzed--form. The results are compared with respect to the number of parse trees.
The applied partial parsing process is divided into two successive steps. The task of the first step is to detect the single clauses of the possibly complex input-sentence, i.e. main clauses, subordinate clauses and infinitive constructions. In the second step, each single clause is analyzed with respect to minimal NPs and PPs without considering their hierarchical order (coordination, subordination).
The results of this examination reflect the expected disambiguating quality of tagging for subsequent parsing. The percentage of unambiguously parsed input-sequences (i.e. exactly one parse tree) is greater for the tagged text-forms than for the untagged ones: they differ in 29% and 17% for the first analysis-step (handtagged and statistically tagged text respectively) and even in 34% for the second step. A look at the unsuccessfully parsed input-sequences on the other hand shows that the tagged texts result for a much higher percentage in 0 parse-trees than their untagged forms. Preliminary examinations indicate that this difference results for the first parsing step mainly from tagging-errors by the statistically driven annotation.