Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification

Dai, Xiangxiang; Xie, Yuejin; Liu, Maoli; Wang, Xuchuang; Li, Zhuohua; Wang, Huanyu; Lui, John C. S.

计算机科学 > 人机交互

arXiv:2501.01849 (cs)

[提交于 2025年1月3日 ]

标题：多智能体对话在线学习用于自适应LLM响应识别

标题： Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification

Authors:Xiangxiang Dai, Yuejin Xie, Maoli Liu, Xuchuang Wang, Zhuohua Li, Huanyu Wang, John C.S. Lui

摘要：大型语言模型（LLMs）显著的生成能力激发了在不同应用中自动生成响应的日益增长的兴趣。鉴于用户偏好的动态性以及LLM响应性能的不确定性，设计高效的在线学习算法以识别最佳LLM响应（即高质量且符合用户偏好的响应）至关重要。现有的大多数在线算法采用集中式方法，并未能利用显式的用户偏好来更高效和个性化地识别LLM响应。相反，本文介绍了\textit{MACO} (\underline{M}ulti-\underline{A}gent\underline{C}onversational\underline{O}nline Learning for Adaptive LLM Response Identification): 1) 通过多个本地代理（如智能手机）加速在线LLM响应识别过程，同时增强数据隐私；2) 提出一种新颖的对话机制，以自适应的方式进行对话以征求用户偏好（例如，生成响应中更喜欢幽默的语气而非严肃的语气），从而最小化偏好估计的不确定性。我们的理论分析表明，\cadi 是关于累积遗憾近似最优的。此外，\cadi 通过消除先前工作中存在的传统、计算密集型的“G最优设计”来降低通信成本和计算复杂度。大量实验使用开放的LLM \textit{Llama}，结合来自Google和OpenAI的两种不同的嵌入模型进行文本向量表示，表明 \cadi 在在线LLM响应识别方面显著优于当前最先进水平。

摘要： The remarkable generative capability of large language models (LLMs) has sparked a growing interest in automatically generating responses for different applications. Given the dynamic nature of user preferences and the uncertainty of LLM response performance, it is crucial to design efficient online learning algorithms to identify optimal LLM responses (i.e., high-quality responses that also meet user preferences). Most existing online algorithms adopt a centralized approach and fail to leverage explicit user preferences for more efficient and personalized LLM response identification. In contrast, this paper introduces \textit{MACO} (\underline{M}ulti-\underline{A}gent \underline{C}onversational \underline{O}nline Learning for Adaptive LLM Response Identification): 1) The online LLM response identification process is accelerated by multiple local agents (such as smartphones), while enhancing data privacy; 2) A novel conversational mechanism is proposed to adaptively conduct conversations for soliciting user preferences (e.g., a preference for a humorous tone over a serious one in generated responses), so to minimize uncertainty in preference estimation. Our theoretical analysis demonstrates that \cadi\ is near-optimal regarding cumulative regret. Additionally, \cadi\ offers reduced communication costs and computational complexity by eliminating the traditional, computing-intensive ``G-optimal design" found in previous works. Extensive experiments with the open LLM \textit{Llama}, coupled with two different embedding models from Google and OpenAI for text vector representation, demonstrate that \cadi\ significantly outperforms the current state-of-the-art in online LLM response identification.

主题：	人机交互 (cs.HC) ; 人工智能 (cs.AI)
引用方式：	arXiv:2501.01849 [cs.HC]
	(或者 arXiv:2501.01849v1 [cs.HC] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.01849

提交历史

来自： Xiangxiang Dai [查看电子邮件]
[v1] 星期五， 2025 年 1 月 3 日 14:59:38 UTC (302 KB)

计算机科学 > 人机交互

标题：多智能体对话在线学习用于自适应LLM响应识别

标题： Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人机交互

标题： 多智能体对话在线学习用于自适应LLM响应识别 显示英文标题

标题： Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：多智能体对话在线学习用于自适应LLM响应识别