| Bibliography | Kurtyigit, Sinan Cem: Image-based compositionality prediction for English noun compounds. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 137 (2024). 72 pages, english.
|
| Abstract | Predicting the compositionality of English noun-noun compounds, such as lion tooth and climate change, has traditionally relied on text-based ap- proaches. This thesis explores the potential of using a purely image-based approach instead. The proposed image-based approach encodes images of both the compounds and their constituents into vector representations using a Vision Transformer and assesses their similarity through cosine similarity. The effectiveness of this approach is evaluated against human composition- ality ratings from a widely-used dataset and compared with a conventional text-based approach to highlight their respective strengths and weaknesses. Additionally, various image acquisition techniques are explored to determine the most effective way to obtain images that accurately represent the meanings of the corresponding words, as this greatly impacts performance
My results reveal that, with sufficiently representative images, the image- based approach achieves promising results but still falls slightly short overall compared to the baseline text-based approach. Notably, for specific categories of compounds, namely concrete and literal ones, the image-based approach can even outperform the baseline. However, considerable challenges in image acquisition impact overall performance, as obtaining high-quality and contex- tually accurate images remains difficult. Moreover, the image-based approach encounters limitations in cases where visual similarity does not align well with semantic relatedness, suggesting that image-only methods may struggle with accurate predictions for these compounds. This thesis offers a viable alter- native to traditional text-based compositionality prediction approaches and provides insights that could drive the development of multi-modal approaches, potentially enhancing prediction accuracy in this domain.
|
Full text and other links | Volltext
|
| Department(s) | University of Stuttgart, Institute for Natural Language Processing
|
| Superviser(s) | Schulte im Walde, Prof. Sabine; Silberer, Jun.-Prof. Carina; Frassinelli, Jun.-Prof. Diego |
| Entry date | December 19, 2025 |
|---|