| Bibliography | Thamaraiselvan, Vishnuvarthini: Prompt Tuning Vision Foundation Models for Image to Video Transfer. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 90 (2025). 55 pages, english.
|
| Abstract | Due tothelargesizeofvideodatasetsandlackofvarietycausedduetodifficultyincurating them, trainingmodelsonsuchdatasetscanberesourceconsumingandhavelimitedgeneralization capabilities. AnefficientwaytoapproachthischallengewouldbetoconsiderpretrainedVision Foundationmodelsthataretrainedonimagesandadaptthemforvideorecognition.Parameter Efficient FineTuning(PEFT)methodsthatutilizelessernumberofparameterswhilealsoimproving performancearegainingtractionwhilecomparedtofullyfinetuningapproachesthatconsiderhuge number ofparameters.ThisworkaimstobenchmarkPromptTuning-oneofthePEFTmethodson foundationmodelslikeDinoV2andsmalldomainspecificdatasetIKEA-ASM,tounderstandthe relation betweentheeffectivenessofthemethodsandthesizeofthedataset.Theperformanceof fourdifferentprompttuningmethodsonDinoV2forImagetoVideoTransferhavebeenanalysed. It isobservedthatimagepromptingmethodslikeVPT,canoutperformvideopromptingmethods likeVita-CLIP.
|
| Department(s) | University of Stuttgart, Institute of Artificial Intelligence, Intelligent Sensing and Perception
|
| Superviser(s) | Roitberg, Jun.-Prof. Alina; Thiyakesan Ponbagavathi, Thinesh |
| Entry date | March 16, 2026 |
|---|