Revisiting Pre-trained Language Models for Vulnerability Detection

Li, Youpeng; Qi, Weiliang; Wang, Xuyu; Yu, Fuxun; Wang, Xinda

计算机科学 > 密码学与安全

arXiv:2507.16887 (cs)

[提交于 2025年7月22日 ]

标题：重新审视预训练语言模型在漏洞检测中的应用

标题： Revisiting Pre-trained Language Models for Vulnerability Detection

Authors:Youpeng Li, Weiliang Qi, Xuyu Wang, Fuxun Yu, Xinda Wang

摘要：预训练语言模型（PLMs）的快速发展在各种代码相关任务中展示了有前景的结果。然而，它们在检测现实世界漏洞方面的能力仍然是一个关键挑战。% 对安全社区而言。虽然现有的实证研究评估了PLMs在漏洞检测（VD）中的表现，但它们在数据准备、评估设置和实验设置方面的考虑不足，削弱了评估的准确性和全面性。本文介绍了RevisitVD，这是对17个PLMs的广泛评估，涵盖了较小的代码专用PLMs和大规模PLMs，使用了新构建的数据集。具体来说，我们在微调和提示工程两种情况下比较了PLMs的性能，评估了它们在不同训练和测试设置下的有效性和泛化能力，并分析了它们对代码规范化、抽象和语义保留变换的鲁棒性。我们的研究结果表明，对于VD任务，包含旨在捕捉代码语法和语义模式的预训练任务的PLMs优于通用PLMs以及仅在大型代码语料库上预训练或微调的模型。然而，这些模型在现实场景中面临显著挑战，例如在检测具有复杂依赖关系的漏洞、处理由代码规范化和抽象引入的扰动以及识别语义保留的易受攻击代码变换方面存在困难。此外，由于PLMs有限的上下文窗口导致的截断可能会导致相当数量的标记错误。这项研究强调了在实际场景中对模型性能进行全面评估的重要性，并指出了未来的研究方向，以帮助提高PLMs在实际VD应用中的有效性。

摘要： The rapid advancement of pre-trained language models (PLMs) has demonstrated promising results for various code-related tasks. However, their effectiveness in detecting real-world vulnerabilities remains a critical challenge. % for the security community. While existing empirical studies evaluate PLMs for vulnerability detection (VD), their inadequate consideration in data preparation, evaluation setups, and experimental settings undermines the accuracy and comprehensiveness of evaluations. This paper introduces RevisitVD, an extensive evaluation of 17 PLMs spanning smaller code-specific PLMs and large-scale PLMs using newly constructed datasets. Specifically, we compare the performance of PLMs under both fine-tuning and prompt engineering, assess their effectiveness and generalizability across various training and testing settings, and analyze their robustness against code normalization, abstraction, and semantic-preserving transformations. Our findings reveal that, for VD tasks, PLMs incorporating pre-training tasks designed to capture the syntactic and semantic patterns of code outperform both general-purpose PLMs and those solely pre-trained or fine-tuned on large code corpora. However, these models face notable challenges in real-world scenarios, such as difficulties in detecting vulnerabilities with complex dependencies, handling perturbations introduced by code normalization and abstraction, and identifying semantic-preserving vulnerable code transformations. Also, the truncation caused by the limited context windows of PLMs can lead to a non-negligible amount of labeling errors. This study underscores the importance of thorough evaluations of model performance in practical scenarios and outlines future directions to help enhance the effectiveness of PLMs for realistic VD applications.

主题：	密码学与安全 (cs.CR) ; 人工智能 (cs.AI); 机器学习 (cs.LG); 软件工程 (cs.SE)
引用方式：	arXiv:2507.16887 [cs.CR]
	(或者 arXiv:2507.16887v1 [cs.CR] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.16887

提交历史

来自： Youpeng Li [查看电子邮件]
[v1] 星期二， 2025 年 7 月 22 日 17:58:49 UTC (159 KB)

计算机科学 > 密码学与安全

标题：重新审视预训练语言模型在漏洞检测中的应用

标题： Revisiting Pre-trained Language Models for Vulnerability Detection

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 密码学与安全

标题： 重新审视预训练语言模型在漏洞检测中的应用 显示英文标题

标题： Revisiting Pre-trained Language Models for Vulnerability Detection

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：重新审视预训练语言模型在漏洞检测中的应用