Article in Proceedings INPROC-2010-81

BibliographyLange, Ralph; Dürr, Frank; Rothermel, Kurt: Indexing Source Descriptions based on Defined Classes.
In: Proceedings of the 14th International Database Engineering and Applications Symposium (IDEAS '10). Montreal, QC, Canada. August 2010.
University of Stuttgart : Collaborative Research Center SFB 627 (Nexus: World Models for Mobile Context-Based Systems).
pp. 245-256, english.
ACM, August 2010.
Article in Proceedings (Conference Paper).
CR-SchemaH.2.5 (Heterogeneous Databases)
H.3.3 (Information Search and Retrieval)
Keywordsheterogeneous information systems; source descriptions; indexing of source descriptions; defined classes; tree-based index structure

Scaling heterogeneous information systems (HIS) to thousands of sources poses particular challenges to source discovery. It requires a powerful formalism for describing the contents of the sources in a concise manner and for formulating compatible queries as well as a suitable structure for indexing and retrieving the source descriptions efficiently.

We propose an extended logic-based description formalism for large-scale HIS with structured sources and a shared ontology. The formalism refines existing approaches that describe the sources by constraints on the attribute value ranges in several ways: It allows for complex, nested descriptions based on defined classes. It supports alternative descriptions to express that a source may be discovered by different combinations of constraints. Finally, it allows to adjust between positive matching, similar to keyword-based discovery, and negative matching, as used in existing logic-based approaches.

We further propose the SDC-Tree for indexing such source descriptions. To allow for efficient discovery, the SDC-Tree features multidimensional indexing capabilities for the different attributes and the IS-A hierarchy of the shared ontology, but also incorporates the existence or absence of constraints. For this purpose, it supports three different types of node split operations which exploit the expressiveness of the description formalism. Therefore, we also propose a generic split algorithm which can be used with arbitrary ontologies.

Full text and
other links
PDF (241821 Bytes)
The original publication is available at ACM Digital Library
Copyright© ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 14th Int'l Database Engineering and Applications Symposium (IDEAS '10), pp. 245-256. Montreal, QC, Canada. August 2010.
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Distributed Systems
Project(s)SFB-627, B5 (University of Stuttgart, Institute of Parallel and Distributed Systems, Distributed Systems)
Entry dateAugust 25, 2010
   Publ. Department   Publ. Institute   Publ. Computer Science