RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines

Jia, Austin; Ramesh, Avaneesh; Shamsi, Zain; Zhang, Daniel; Liu, Alex

计算机科学 > 密码学与安全

arXiv:2510.20768 (cs)

[提交于 2025年10月23日 ]

标题： RAGRank：使用PageRank对抗CTI大语言模型流水线中的投毒攻击

标题： RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines

Authors:Austin Jia, Avaneesh Ramesh, Zain Shamsi, Daniel Zhang, Alex Liu

摘要：检索增强生成（RAG）已成为在网络安全威胁情报（CTI）系统中实现大型语言模型（LLM）使用的主导架构模式。然而，这种设计容易受到中毒攻击，之前提出的防御措施在CTI上下文中可能失效，因为网络威胁信息对于新兴攻击来说通常是全新的，而且复杂的威胁行为者可以模仿合法的格式、术语和风格惯例。为了解决这个问题，我们提出通过在语料库上应用源可信度算法来加速现代RAG防御的鲁棒性，以PageRank为例。在我们的实验中，我们定量地证明了我们的算法对恶意文档应用较低的权威分数，同时促进可信内容，使用标准化的MS MARCO数据集。我们还在CTI文档和馈送中展示了我们算法的概念验证性能。

摘要： Retrieval-Augmented Generation (RAG) has emerged as the dominant architectural pattern to operationalize Large Language Model (LLM) usage in Cyber Threat Intelligence (CTI) systems. However, this design is susceptible to poisoning attacks, and previously proposed defenses can fail for CTI contexts as cyber threat information is often completely new for emerging attacks, and sophisticated threat actors can mimic legitimate formats, terminology, and stylistic conventions. To address this issue, we propose that the robustness of modern RAG defenses can be accelerated by applying source credibility algorithms on corpora, using PageRank as an example. In our experiments, we demonstrate quantitatively that our algorithm applies a lower authority score to malicious documents while promoting trusted content, using the standardized MS MARCO dataset. We also demonstrate proof-of-concept performance of our algorithm on CTI documents and feeds.

主题：	密码学与安全 (cs.CR) ; 人工智能 (cs.AI); 信息检索 (cs.IR)
引用方式：	arXiv:2510.20768 [cs.CR]
	(或者 arXiv:2510.20768v1 [cs.CR] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.20768

提交历史

来自： Zain Shamsi [查看电子邮件]
[v1] 星期四， 2025 年 10 月 23 日 17:43:00 UTC (519 KB)

计算机科学 > 密码学与安全

标题： RAGRank：使用PageRank对抗CTI大语言模型流水线中的投毒攻击

标题： RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 密码学与安全

标题： RAGRank：使用PageRank对抗CTI大语言模型流水线中的投毒攻击 显示英文标题

标题： RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： RAGRank：使用PageRank对抗CTI大语言模型流水线中的投毒攻击