| Bibliography | Sulaiman, Yasser: Sparse Adaptation for Fine-Tuning Large Language Models: Efficient Parameter Selection and Mitigation of Catastrophic Forgetting. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 134 (2024). 42 pages, english.
|
| Abstract | This thesis explores the application of Sparse Adaptation for Fine-Tuning (SAFT) in large language models (LLMs), focusing on its ability to adapt pre-trained models to new tasks while mitigating catastrophic forgetting. SAFT operates by updating only a small subset of parameters based on gradient magnitudes, allowing models to keep previously learned knowledge while adapting for specific tasks. The study introduces a modified version of SAFT to address the computational and memory inefficiencies of the original algorithm, incorporating techniques such as gradient masking and selecting trainable parameters considering only a small subset of gradient magnitudes. These enhancements significantly reduce memory overhead, making SAFT more practical for fine-tuning large models. Experimental results demonstrate that SAFT delivers competitive performance in in-distribution (ID) tasks. More importantly, it consistently outperforms both Full Fine-Tuning (FFT) and Low-Rank Adaptation (LoRA) in out-of-distribution (OOD) evaluations, highlighting its superior generalization capabilities. SAFT’s ability to prevent catastrophic forgetting is particularly effective in multitask learning, where adapting unique set of parameters for each task ensures that task-specific fine-tuning does not degrade performance on previously learned tasks. The study also shows that models fine-tuned with SAFT can be successfully combined, further underscoring its flexibility in multitask settings. While SAFT shows promise, it still has limitations, particularly regarding memory requirements and slightly lower performance in ID tasks compared to FFT. The thesis suggest directions to improve SAFT’s scalability and overall efficiency
|
| Department(s) | University of Stuttgart, Institute for Natural Language Processing
|
| Superviser(s) | Vu, Prof. Thang; Schweitzer, Dr. Antje |
| Entry date | November 14, 2025 |
|---|