XML is becoming one of the main technological integredients of the Internet. It is now accepted as the standard for information exchange. XML-based data integration system, which enables sharing and cooperation with legacy data sources, arises as a more and more important data service provider on the web. These services can provide the users with a uniform interface to a multitude of data sources such as relational databases, XML files, text files, delimited files, Excel files, etc. Users can thus focus on what they want, rather than think about how to obtain the answers. Therefore, users do not have to carry on the tedious tasks such as finding the relevant data sources, interacting with each data source in isolation using the local interface and combining data from multiple data sources.
Users are always expecting better query performance and data consistency from the data integration systems. This work proposes an approach to support constraints and triggers in the XML-based data integration system in order to optimize queries and to enforce data consistency. Constraints and triggers have long been recognized to be useful in semantic query optimization and data consistency enforcement in relational databases. This work first gives an approach to use constraints from the heterogeneous data sources to semantically optimize queries submitted to the XML-based data integration system. Different constraints from the data sources are first integrated into a uniform constraint model. Then the constraints in the uniform constraint model are stored in the constraint repository. Traditional semantic query optimization techniques in the relational database are analyzed and three of them are reused and applied by the semantic query optimizer for XML-based data integration system. Among them are detection of empty results, join elimination and predicate elimination. Performance is analyzed according to the data source type and the data volume. The semantic query optimizer works best when the data sources are non-relational, the data volume is huge and the execution cost is expected to be high.
In order to make the XML-based data integration system fully equipped with data manipulation capabilities, programming frameworks which support update at the integration level are being developed. This work discusses how to realize update in the XML-based data integration system under the Service Data Objects programming framework. When the user is permitted to submit updates, it is necessary to guarantee data integrity and enforce active business logics in the data integration system. This work presents an approach by which active rules including integrity constraints are enforced by XQuery triggers. An XQuery trigger model in conformance to XQuery update model proposed by W3C is defined. How to define active rules and integrity constraints by XQuery triggers is discussed. Triggers and constraints are stored in the trigger repository. The architecture supporting XQuery trigger service in the XML-based data integration system is proposed. Important components including event detection, trigger scheduling, condition evaluation, action firing and trigger termination are discussed. The whole XQuery trigger service architecture above a data integration system is implemented in BEA AquaLogic DataService Platform under the Service Data Objects programming framework. Experiments show active rules and integrity constraints are enforced easily, efficiently and conveniently at the global level.
Constraints and triggers play an important role in XML-based data integration systems. Using constraints and triggers in the XML-based data integration system we can efficiently improve query performance and enforce data consistency.