Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

Lafargue, Valentin; Monteiro, Adriana Laurindo; Claeys, Emmanuelle; Risser, Laurent; Loubes, Jean-Michel

计算机科学 > 机器学习

arXiv:2507.20708 (cs)

[提交于 2025年7月28日 ]

标题：揭露公平性的幻象：对分布操纵攻击的漏洞审计

标题： Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

Authors:Valentin Lafargue, Adriana Laurindo Monteiro, Emmanuelle Claeys, Laurent Risser, Jean-Michel Loubes

摘要：证明人工智能算法的合规性已成为一个重要的挑战，随着这些算法在现实应用中的广泛部署。检查可能的偏见行为是满足欧盟人工智能法案法规约束的必要条件。监管驱动的审计越来越多地依赖于全局公平性指标，其中差异影响是最常用的指标。然而，这些全局度量高度依赖于计算这些度量的样本分布。我们首先研究如何操纵数据样本以人为地满足公平性标准，创建出经过最小扰动的数据集，这些数据集在统计上与原始分布无法区分，同时满足规定的公平性约束。然后我们研究如何检测这种操纵。我们的分析（i）介绍了在公平性约束下使用熵或最优传输投影来修改经验分布的数学上可靠的方法，（ii）探讨了被审计方可能如何规避公平性检查，（iii）提供了帮助审计员检测此类数据操纵的建议。这些结果通过在经典表格数据集上的偏差检测实验得到了验证。

摘要： Proving the compliance of AI algorithms has become an important challenge with the growing deployment of such algorithms for real-life applications. Inspecting possible biased behaviors is mandatory to satisfy the constraints of the regulations of the EU Artificial Intelligence's Act. Regulation-driven audits increasingly rely on global fairness metrics, with Disparate Impact being the most widely used. Yet such global measures depend highly on the distribution of the sample on which the measures are computed. We investigate first how to manipulate data samples to artificially satisfy fairness criteria, creating minimally perturbed datasets that remain statistically indistinguishable from the original distribution while satisfying prescribed fairness constraints. Then we study how to detect such manipulation. Our analysis (i) introduces mathematically sound methods for modifying empirical distributions under fairness constraints using entropic or optimal transport projections, (ii) examines how an auditee could potentially circumvent fairness inspections, and (iii) offers recommendations to help auditors detect such data manipulations. These results are validated through experiments on classical tabular datasets in bias detection.

主题：	机器学习 (cs.LG) ; 优化与控制 (math.OC); 应用 (stat.AP)
引用方式：	arXiv:2507.20708 [cs.LG]
	(或者 arXiv:2507.20708v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.20708

提交历史

来自： Valentin Lafargue [查看电子邮件]
[v1] 星期一， 2025 年 7 月 28 日 11:01:48 UTC (4,312 KB)

计算机科学 > 机器学习

标题：揭露公平性的幻象：对分布操纵攻击的漏洞审计

标题： Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 揭露公平性的幻象：对分布操纵攻击的漏洞审计 显示英文标题

标题： Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：揭露公平性的幻象：对分布操纵攻击的漏洞审计