Kurzfassung | This thesis investigates the creation of an automated Evaluation system for reviewing documents submitted by university program applicants, particularly focusing on the University of Stuttgart. The goal is to streamline the manual review process, which has become more challenging due to the increasing number of global applications. It tackles key challenges such as handling scanned documents, managing document formats and efficient retrieval system. The research examines current tools like uni-assist, GradCAS, and CAMPUSonline, and suggests a centralized selection system that utilizes advanced techniques namely Optical Character Recognition, Natural Language Processing, Named Entity Recognition, and Document Management Systems using Firebase. It integrates Event-driven architecture, SQL, LLM, and automated workflows to improve the system’s response time. The study explores Frequency-Inverse Document Frequency and Retrieval-Augmented Generation methods to enhance document analysis and retrieval. By combining a rule-based approach with machine learning, the paper proposes a hybrid system that enhances document evaluation accuracy. The implementation details encompass transforming unstructured PDF data into structured formats, conducting data parsing, and integrating lemmatization and stemming techniques to refine the outcomes. This research showcases the effectiveness of the proposed system through a comprehensive analysis of generated data, including OCR performance and RAG-based document retrieval. The work concludes by highlighting the potential for scaling this system to larger datasets and real-world applications, by developing a SaaS platform for universities, while also addressing ethical and compliance considerations.
|