PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation

Li, Wenhao; Manickam, Selvakumar; Chong, Yung-wey; Karuppayah, Shankar

计算机科学 > 密码学与安全

arXiv:2507.15419 (cs)

[提交于 2025年7月21日 ]

标题： PhishIntentionLLM：通过多智能体检索增强生成揭示钓鱼网站意图

标题： PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation

Authors:Wenhao Li, Selvakumar Manickam, Yung-wey Chong, Shankar Karuppayah

摘要：网络钓鱼网站仍然是主要的网络安全威胁，但现有方法主要集中在检测上，而对潜在恶意意图的识别仍大多未被探索。为解决这一差距，我们提出PhishIntentionLLM，这是一种多智能体检索增强生成（RAG）框架，能够从网站截图中揭示网络钓鱼意图。利用大型语言模型（LLMs）的视觉-语言能力，我们的框架识别出四个关键的网络钓鱼目标：凭证窃取、金融欺诈、恶意软件分发和个人信息收集。我们构建并发布了第一个网络钓鱼意图真实数据集（约2K个样本），并使用四种商业LLMs评估该框架。实验结果表明，PhishIntentionLLM在GPT-4o上实现了0.7895的微精度，并且与单智能体基线相比，微精度提高了约95%。与之前的工作相比，它在凭证窃取上的精度达到0.8545，标志着提高了约4%。此外，我们生成了一个约9K个样本的更大数据集，用于跨行业的大规模网络钓鱼意图分析。这项工作提供了一种可扩展且可解释的意图感知网络钓鱼分析解决方案。

摘要： Phishing websites remain a major cybersecurity threat, yet existing methods primarily focus on detection, while the recognition of underlying malicious intentions remains largely unexplored. To address this gap, we propose PhishIntentionLLM, a multi-agent retrieval-augmented generation (RAG) framework that uncovers phishing intentions from website screenshots. Leveraging the visual-language capabilities of large language models (LLMs), our framework identifies four key phishing objectives: Credential Theft, Financial Fraud, Malware Distribution, and Personal Information Harvesting. We construct and release the first phishing intention ground truth dataset (~2K samples) and evaluate the framework using four commercial LLMs. Experimental results show that PhishIntentionLLM achieves a micro-precision of 0.7895 with GPT-4o and significantly outperforms the single-agent baseline with a ~95% improvement in micro-precision. Compared to the previous work, it achieves 0.8545 precision for credential theft, marking a ~4% improvement. Additionally, we generate a larger dataset of ~9K samples for large-scale phishing intention profiling across sectors. This work provides a scalable and interpretable solution for intention-aware phishing analysis.

评论：	被EAI ICDF2C 2025接受
主题：	密码学与安全 (cs.CR)
引用方式：	arXiv:2507.15419 [cs.CR]
	(或者 arXiv:2507.15419v1 [cs.CR] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.15419

提交历史

来自： Wenhao Li [查看电子邮件]
[v1] 星期一， 2025 年 7 月 21 日 09:20:43 UTC (3,172 KB)

计算机科学 > 密码学与安全

标题： PhishIntentionLLM：通过多智能体检索增强生成揭示钓鱼网站意图

标题： PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 密码学与安全

标题： PhishIntentionLLM：通过多智能体检索增强生成揭示钓鱼网站意图 显示英文标题

标题： PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： PhishIntentionLLM：通过多智能体检索增强生成揭示钓鱼网站意图