Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2507.17088

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.17088 (cs)
[Submitted on 23 Jul 2025 ]

Title: FedVLM: Scalable Personalized Vision-Language Models through Federated Learning

Title: FedVLM:通过联邦学习实现可扩展的个性化视觉-语言模型

Authors:Arkajyoti Mitra (1), Afia Anjum (1), Paul Agbaje (1), Mert Pesé (2), Habeeb Olufowobi (1) ((1) University of Texas at Arlington, (2) Clemson University)
Abstract: Vision-language models (VLMs) demonstrate impressive zero-shot and few-shot learning capabilities, making them essential for several downstream tasks. However, fine-tuning these models at scale remains challenging, particularly in federated environments where data is decentralized and non-iid across clients. Existing parameter-efficient tuning methods like LoRA (Low-Rank Adaptation) reduce computational overhead but struggle with heterogeneous client data, leading to suboptimal generalization. To address these challenges, we propose FedVLM, a federated LoRA fine-tuning framework that enables decentralized adaptation of VLMs while preserving model privacy and reducing reliance on centralized training. To further tackle data heterogeneity, we introduce personalized LoRA (pLoRA), which dynamically adapts LoRA parameters to each client's unique data distribution, significantly improving local adaptation while maintaining global model aggregation. Experiments on the RLAIF-V dataset show that pLoRA improves client-specific performance by 24.5% over standard LoRA, demonstrating superior adaptation in non-iid settings. FedVLM provides a scalable and efficient solution for fine-tuning VLMs in federated settings, advancing personalized adaptation in distributed learning scenarios.
Abstract: 视觉-语言模型(VLMs)表现出令人印象深刻的零样本和少量样本学习能力,使其成为多个下游任务的关键。 然而,在大规模环境下微调这些模型仍然具有挑战性,特别是在联邦环境中,数据是去中心化的且在客户端之间是非独立同分布的。 现有的参数高效微调方法如LoRA(低秩适应)减少了计算开销,但在处理异构客户端数据时表现不佳,导致泛化效果不理想。 为了解决这些问题,我们提出了FedVLM,这是一种联邦LoRA微调框架,能够在保护模型隐私的同时实现视觉-语言模型的去中心化适应,并减少对集中式训练的依赖。 为了进一步应对数据异质性问题,我们引入了个性化LoRA(pLoRA),它能够动态适应每个客户端的独特数据分布,显著提升本地适应效果,同时保持全局模型聚合。 在RLAIF-V数据集上的实验表明,pLoRA在标准LoRA的基础上提高了客户端特定性能24.5%,展示了在非独立同分布设置中的优越适应性。 FedVLM为在联邦设置中微调VLMs提供了一个可扩展且高效的解决方案,推动了分布式学习场景中的个性化适应。
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2507.17088 [cs.CV]
  (or arXiv:2507.17088v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2507.17088
arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Arkajyoti Mitra [view email]
[v1] Wed, 23 Jul 2025 00:05:02 UTC (2,660 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
cs.CV
< prev   |   next >
new | recent | 2025-07
Change to browse by:
cs

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号