HoneypotNet: Backdoor Attacks Against Model Extraction

Wang, Yixu; Gu, Tianle; Teng, Yan; Wang, Yingchun; Ma, Xingjun

计算机科学 > 密码学与安全

arXiv:2501.01090 (cs)

[提交于 2025年1月2日 ]

标题： HoneypotNet：针对模型提取的后门攻击

标题： HoneypotNet: Backdoor Attacks Against Model Extraction

Authors:Yixu Wang, Tianle Gu, Yan Teng, Yingchun Wang, Xingjun Ma

摘要：模型提取攻击是一种推理时攻击，通过向模型发起一定数量的查询来近似黑盒受害者模型的功能和性能，然后利用模型的预测来训练一个替代模型。这些攻击对生产模型和MLaaS平台构成了严重的安全威胁，并可能导致模型所有者遭受重大经济损失。已有大量研究提出了防御机器学习模型免受模型提取攻击的方法，包括主动防御方法，通过修改模型的输出或增加查询开销以避免提取，以及被动防御方法，通过检测恶意查询或利用水印进行事后验证。在本工作中，我们引入了一种新的防御范式，称为“攻击作为防御”，该范式修改模型的输出使其有毒，从而使任何试图使用输出训练替代模型的恶意用户都会被毒害。为此，我们提出了一种名为HoneypotNet的新轻量级后门攻击方法，该方法将受害者模型的分类层替换为蜜罐层，然后通过双层优化与影子模型（用于模拟模型提取）微调蜜罐层，以修改其输出使其有毒，同时保持原始性能。我们在四个常用的基准数据集上实证证明，HoneypotNet可以以高成功率向替代模型中注入后门。注入的后门不仅有助于所有权验证，还会破坏替代模型的功能，从而成为模型提取攻击的重要威慑。

摘要： Model extraction attacks are one type of inference-time attacks that approximate the functionality and performance of a black-box victim model by launching a certain number of queries to the model and then leveraging the model's predictions to train a substitute model. These attacks pose severe security threats to production models and MLaaS platforms and could cause significant monetary losses to the model owners. A body of work has proposed to defend machine learning models against model extraction attacks, including both active defense methods that modify the model's outputs or increase the query overhead to avoid extraction and passive defense methods that detect malicious queries or leverage watermarks to perform post-verification. In this work, we introduce a new defense paradigm called attack as defense which modifies the model's output to be poisonous such that any malicious users that attempt to use the output to train a substitute model will be poisoned. To this end, we propose a novel lightweight backdoor attack method dubbed HoneypotNet that replaces the classification layer of the victim model with a honeypot layer and then fine-tunes the honeypot layer with a shadow model (to simulate model extraction) via bi-level optimization to modify its output to be poisonous while remaining the original performance. We empirically demonstrate on four commonly used benchmark datasets that HoneypotNet can inject backdoors into substitute models with a high success rate. The injected backdoor not only facilitates ownership verification but also disrupts the functionality of substitute models, serving as a significant deterrent to model extraction attacks.

评论：	被AAAI 2025接收
主题：	密码学与安全 (cs.CR) ; 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2501.01090 [cs.CR]
	(或者 arXiv:2501.01090v1 [cs.CR] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.01090

提交历史

来自： Yixu Wang [查看电子邮件]
[v1] 星期四， 2025 年 1 月 2 日 06:23:51 UTC (4,571 KB)

计算机科学 > 密码学与安全

标题： HoneypotNet：针对模型提取的后门攻击

标题： HoneypotNet: Backdoor Attacks Against Model Extraction

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 密码学与安全

标题： HoneypotNet：针对模型提取的后门攻击 显示英文标题

标题： HoneypotNet: Backdoor Attacks Against Model Extraction

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： HoneypotNet：针对模型提取的后门攻击