Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality

Feng, Naihe; Sui, Yi; Hou, Shiyi; Cresswell, Jesse C.; Wu, Ga

doi:10.1145/3726302.3730244

计算机科学 > 信息检索

arXiv:2506.20978 (cs)

[提交于 2025年6月26日 ]

标题：基于条件共形真实性的检索增强生成响应质量评估

标题： Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality

Authors:Naihe Feng, Yi Sui, Shiyi Hou, Jesse C. Cresswell, Ga Wu

摘要：现有关于检索增强生成（RAG）的研究主要集中在提高整体问答准确性，常常忽视生成响应中子主张的质量。最近试图提高RAG可信度的方法，例如通过自动评估指标，缺乏概率保证或需要真实答案。为了解决这些限制，我们提出了Conformal-RAG，这是一个受最近在大型语言模型（LLMs）上应用的符合预测（CP）启发的新框架。 Conformal-RAG利用CP和RAG机制中的内部信息，对响应质量提供统计保证。它确保跨多个子领域的组条件覆盖，而无需手动标记符合集，使其适用于复杂的RAG应用。与现有的RAG自动评估方法相比，Conformal-RAG对精炼子主张的质量提供了统计保证，确保响应可靠性而无需真实答案。此外，我们的实验表明，通过利用RAG系统的信息，Conformal-RAG相比直接将CP应用于LLMs，保留了多达60%的高质量子主张，同时保持相同的可靠性保证。

摘要： Existing research on Retrieval-Augmented Generation (RAG) primarily focuses on improving overall question-answering accuracy, often overlooking the quality of sub-claims within generated responses. Recent methods that attempt to improve RAG trustworthiness, such as through auto-evaluation metrics, lack probabilistic guarantees or require ground truth answers. To address these limitations, we propose Conformal-RAG, a novel framework inspired by recent applications of conformal prediction (CP) on large language models (LLMs). Conformal-RAG leverages CP and internal information from the RAG mechanism to offer statistical guarantees on response quality. It ensures group-conditional coverage spanning multiple sub-domains without requiring manual labelling of conformal sets, making it suitable for complex RAG applications. Compared to existing RAG auto-evaluation methods, Conformal-RAG offers statistical guarantees on the quality of refined sub-claims, ensuring response reliability without the need for ground truth answers. Additionally, our experiments demonstrate that by leveraging information from the RAG system, Conformal-RAG retains up to 60\% more high-quality sub-claims from the response compared to direct applications of CP to LLMs, while maintaining the same reliability guarantee.

评论：	被SIGIR 2025会议接收为短文，5页，代码可在 https://github.com/n4feng/ResponseQualityAssessment 获取
主题：	信息检索 (cs.IR)
ACM 类：	H.3.3
引用方式：	arXiv:2506.20978 [cs.IR]
	(或者 arXiv:2506.20978v1 [cs.IR] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.20978
相关 DOI:	https://doi.org/10.1145/3726302.3730244

提交历史

来自： Naihe Feng [查看电子邮件]
[v1] 星期四， 2025 年 6 月 26 日 03:52:56 UTC (249 KB)

计算机科学 > 信息检索

标题：基于条件共形真实性的检索增强生成响应质量评估

标题： Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 信息检索

标题： 基于条件共形真实性的检索增强生成响应质量评估 显示英文标题

标题： Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于条件共形真实性的检索增强生成响应质量评估