Sparse Gradient Compression for Fine-Tuning Large Language Models

Yang, David H.; Amiri, Mohammad Mohammadi; Pedapati, Tejaswini; Chaudhury, Subhajit; Chen, Pin-Yu

Computer Science > Machine Learning

arXiv:2502.00311 (cs)

[Submitted on 1 Feb 2025 ]

Title: Sparse Gradient Compression for Fine-Tuning Large Language Models

Title: 稀疏梯度压缩用于微调大型语言模型

Authors:David H. Yang, Mohammad Mohammadi Amiri, Tejaswini Pedapati, Subhajit Chaudhury, Pin-Yu Chen

Abstract: Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models. However, the high memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size. To address this, parameter efficient fine-tuning (PEFT) methods have been proposed to minimize the number of parameters required for fine-tuning LLMs. However, these approaches often tie the number of optimizer states to dimensions of model parameters, limiting flexibility and control during fine-tuning. In this paper, we propose sparse gradient compression (SGC), a training regime designed to address these limitations. Our approach leverages inherent sparsity in gradients to compress optimizer states by projecting them onto a low-dimensonal subspace, with dimensionality independent of the original model's parameters. By enabling optimizer state updates in an arbitrary low-dimensional subspace, SGC offers a flexible tradeoff between memory efficiency and performance. We demonstrate through experiments that SGC can decrease memory usage in optimizer states more effectively than existing PEFT methods. Furthermore, by fine-tuning LLMs on various downstream tasks, we show that SGC can deliver superior performance while substantially lowering optimizer state memory requirements, particularly in both data-limited and memory-limited settings.

Abstract: 针对下游任务对大型语言模型（LLMs）进行微调变得越来越重要，这主要是由于它们的广泛应用以及开源模型的日益普及。然而，与微调相关的高内存成本仍然是一个重大挑战，尤其是在模型规模不断增加的情况下。为了解决这个问题，提出了参数高效微调（PEFT）方法，以尽量减少微调LLMs所需的参数数量。然而，这些方法通常将优化器状态的数量与模型参数的维度绑定在一起，限制了微调过程中的灵活性和控制能力。本文提出稀疏梯度压缩（SGC），这是一种旨在解决这些局限性的训练方案。我们的方法利用梯度中固有的稀疏性，通过将其投影到低维子空间来压缩优化器状态，并且该子空间的维度独立于原始模型的参数。通过允许优化器状态更新在任意低维子空间中进行，SGC在内存效率和性能之间提供了灵活的权衡。我们通过实验表明，SGC可以比现有的PEFT方法更有效地减少优化器状态中的内存使用。此外，通过对LLMs在各种下游任务上的微调，我们展示了SGC能够在降低优化器状态内存需求的同时提供卓越的性能，特别是在数据受限和内存受限的环境中。

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2502.00311 [cs.LG]
	(or arXiv:2502.00311v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.00311

Submission history

From: David Hong Yang [view email]
[v1] Sat, 1 Feb 2025 04:18:28 UTC (1,532 KB)

Computer Science > Machine Learning

Title: Sparse Gradient Compression for Fine-Tuning Large Language Models

Title: 稀疏梯度压缩用于微调大型语言模型

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title: Sparse Gradient Compression for Fine-Tuning Large Language Models Show Chinese title

Title: 稀疏梯度压缩用于微调大型语言模型

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Sparse Gradient Compression for Fine-Tuning Large Language Models