Master Thesis MSTR-2025-90

BibliographyThamaraiselvan, Vishnuvarthini: Prompt Tuning Vision Foundation Models for Image to Video Transfer.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 90 (2025).
55 pages, english.
Abstract

Due tothelargesizeofvideodatasetsandlackofvarietycausedduetodifficultyincurating them, trainingmodelsonsuchdatasetscanberesourceconsumingandhavelimitedgeneralization capabilities. AnefficientwaytoapproachthischallengewouldbetoconsiderpretrainedVision Foundationmodelsthataretrainedonimagesandadaptthemforvideorecognition.Parameter Efficient FineTuning(PEFT)methodsthatutilizelessernumberofparameterswhilealsoimproving performancearegainingtractionwhilecomparedtofullyfinetuningapproachesthatconsiderhuge number ofparameters.ThisworkaimstobenchmarkPromptTuning-oneofthePEFTmethodson foundationmodelslikeDinoV2andsmalldomainspecificdatasetIKEA-ASM,tounderstandthe relation betweentheeffectivenessofthemethodsandthesizeofthedataset.Theperformanceof fourdifferentprompttuningmethodsonDinoV2forImagetoVideoTransferhavebeenanalysed. It isobservedthatimagepromptingmethodslikeVPT,canoutperformvideopromptingmethods likeVita-CLIP.

Department(s)University of Stuttgart, Institute of Artificial Intelligence, Intelligent Sensing and Perception
Superviser(s)Roitberg, Jun.-Prof. Alina; Thiyakesan Ponbagavathi, Thinesh
Entry dateMarch 16, 2026
New Report   New Article   New Monograph   Computer Science