Blind Targeting: Personalization under Third-Party Privacy Constraints

Shchetkina, Anya

统计学 > 方法论

arXiv:2507.05175 (stat)

[提交于 2025年7月7日 ]

标题：盲目标定：第三方隐私约束下的个性化

标题： Blind Targeting: Personalization under Third-Party Privacy Constraints

Authors:Anya Shchetkina

摘要：主要广告平台最近通过限制广告商对个体级数据的访问来增加隐私保护。而不是提供对细粒度原始数据的访问，这些平台只允许对数据集进行有限数量的聚合查询，并通过添加差分隐私噪声进一步保护数据。本文研究了广告商在这些限制性的隐私保护数据环境中能否以及如何设计有效的定位策略。为了实现这一目标，我开发了一种基于贝叶斯优化的概率机器学习方法，该方法有助于动态数据探索。由于贝叶斯优化是设计用来从函数中采样点以找到其最大值的，因此它不适用于聚合查询和定位。因此，我引入了两项创新：(i) 后验的积分更新，这使得可以选择最佳的数据区域进行查询，而不是单个点；(ii) 一种面向定位的获取函数，可以动态选择对定位任务最有信息量的区域。我确定了数据集和隐私环境的条件，这些条件需要使用这种“智能”查询策略。我将这种战略查询方法应用于Criteo AI Labs的提升建模数据集（Diemert等，2018），该数据集包含来自14M用户的访问和转化数据。我表明，在某些情况下，一个直观的基准策略仅能实现非隐私保护定位潜力的33%，而我的战略查询方法实现了该潜力的97-101%，并且在统计上与因果森林（Athey等，2019）无法区分：这是一种最先进的非隐私保护机器学习定位方法。

摘要： Major advertising platforms recently increased privacy protections by limiting advertisers' access to individual-level data. Instead of providing access to granular raw data, the platforms only allow a limited number of aggregate queries to a dataset, which is further protected by adding differentially private noise. This paper studies whether and how advertisers can design effective targeting policies within these restrictive privacy preserving data environments. To achieve this, I develop a probabilistic machine learning method based on Bayesian optimization, which facilitates dynamic data exploration. Since Bayesian optimization was designed to sample points from a function to find its maximum, it is not applicable to aggregate queries and to targeting. Therefore, I introduce two innovations: (i) integral updating of posteriors which allows to select the best regions of the data to query rather than individual points and (ii) a targeting-aware acquisition function that dynamically selects the most informative regions for the targeting task. I identify the conditions of the dataset and privacy environment that necessitate the use of such a "smart" querying strategy. I apply the strategic querying method to the Criteo AI Labs dataset for uplift modeling (Diemert et al., 2018) that contains visit and conversion data from 14M users. I show that an intuitive benchmark strategy only achieves 33% of the non-privacy-preserving targeting potential in some cases, while my strategic querying method achieves 97-101% of that potential, and is statistically indistinguishable from Causal Forest (Athey et al., 2019): a state-of-the-art non-privacy-preserving machine learning targeting method.

主题：	方法论 (stat.ME) ; 机器学习 (cs.LG); 计量经济学 (econ.EM); 应用 (stat.AP)
引用方式：	arXiv:2507.05175 [stat.ME]
	(或者 arXiv:2507.05175v1 [stat.ME] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.05175

提交历史

来自： Anya Shchetkina [查看电子邮件]
[v1] 星期一， 2025 年 7 月 7 日 16:30:40 UTC (1,804 KB)

统计学 > 方法论

标题：盲目标定：第三方隐私约束下的个性化

标题： Blind Targeting: Personalization under Third-Party Privacy Constraints

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 方法论

标题： 盲目标定：第三方隐私约束下的个性化 显示英文标题

标题： Blind Targeting: Personalization under Third-Party Privacy Constraints

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：盲目标定：第三方隐私约束下的个性化