Bibliograph. Daten | Karlysheva, Anna: Ranking Metrics for Conditional Inclusion Dependencies. Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Masterarbeit Nr. 23 (2025). 69 Seiten, englisch.
|
| Kurzfassung | Conditional inclusion dependencies (CINDs) can express relationships between different tables and are widely used in data cleaning, data integration, and schema matching. Existing discovery algorithms generate a significant number of CINDs. Given previous findings for the ranking of conditional functional dependencies and association rules, a subset of the most relevant CINDs may be used to achieve the same results compared to the full set of CINDs produced by the discovery algorithm. In this thesis, we propose that using a ranking algorithm can optimize the selection of CINDs among the ones produced by a discovery algorithm. We investigate and evaluate ranking metrics and algorithms for CINDs. We mainly review ranking metrics used for association rules and some interesting ranking algorithms and adapt them to use in CINDs. We executed three experiments that evaluated the effectiveness, quality, and efficiency of the selected CINDs. The first experiment evaluates results based on the number of matched tuples using selected CINDs. In the second experiment we used the selected CINDs to identify matches and evaluate the quality metrics: precision, recall, and F-measure. The final experiment evaluates the efficiency of the different ranking approaches according to the execution time. As a result of the experiments, we observed that the top-20 ranked CINDs provide coverage of approximately the same amount of matches covered by the full CIND set. We also observed a correlation between the number of matched tuples and execution time, as was expected. Overall, our findings support the hypothesis that ranking CINDs is an effective strategy that significantly improves data cleaning performance.
|
| Abteilung(en) | Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Data Engineering
|
| Betreuer | Herschel, Prof. Melanie; Capobianco Shimomura, Dr. Larissa |
| Eingabedatum | 13. August 2025 |
|---|