Chiseling: Powerful and Valid Subgroup Selection via Interactive Machine Learning

Cheng, Nathan; Spector, Asher; Janson, Lucas

统计学 > 方法论

arXiv:2509.19490v3 (stat)

[提交于 2025年9月23日 (v1) ，最后修订 2025年10月29日 (此版本， v3)]

标题：凿刻：通过交互式机器学习进行强大且有效的子组选择

标题： Chiseling: Powerful and Valid Subgroup Selection via Interactive Machine Learning

Authors:Nathan Cheng, Asher Spector, Lucas Janson

摘要：在回归和因果推断中，受控子群选择旨在以推断保证的方式，识别一个子群（定义为协变量空间的一个子集），该子群上的平均响应或处理效应高于给定阈值。例如，在临床试验中，可能需要找到一个具有正平均处理效应的子群。然而，现有方法要么缺乏推断保证，要么对子群的搜索施加了严格的限制，或者通过简单的数据分割牺牲了效率。我们提出了一种称为“chiseling”的新框架，允许分析人员通过迭代缩小子群来交互式地精炼和测试候选子群。唯一的要求是，收缩方向仅依赖于当前子群之外的点，但除此之外，分析人员可以利用任何先验信息或机器学习算法。尽管具有这种灵活性，chiseling 在最小假设下控制发现的子群为零（例如，具有非正平均处理效应）的概率：例如，在随机实验中，这种推断有效性保证仅在有限矩条件下成立。当应用于各种模拟数据集和一个实际调查实验时，chiseling 识别出的子群明显优于现有具有推断保证的方法。

摘要： In regression and causal inference, controlled subgroup selection aims to identify, with inferential guarantees, a subgroup (defined as a subset of the covariate space) on which the average response or treatment effect is above a given threshold. E.g., in a clinical trial, it may be of interest to find a subgroup with a positive average treatment effect. However, existing methods either lack inferential guarantees, heavily restrict the search for the subgroup, or sacrifice efficiency by naive data splitting. We propose a novel framework called chiseling that allows the analyst to interactively refine and test a candidate subgroup by iteratively shrinking it. The sole restriction is that the shrinkage direction only depends on the points outside the current subgroup, but otherwise the analyst may leverage any prior information or machine learning algorithm. Despite this flexibility, chiseling controls the probability that the discovered subgroup is null (e.g., has a non-positive average treatment effect) under minimal assumptions: for example, in randomized experiments, this inferential validity guarantee holds under only bounded moment conditions. When applied to a variety of simulated datasets and a real survey experiment, chiseling identifies substantially better subgroups than existing methods with inferential guarantees.

评论：	26+7+97页（正文、参考文献、附录），6+15图（正文、附录）；修正了一些参考文献；添加了可重复代码仓库的链接
主题：	方法论 (stat.ME) ; 机器学习 (stat.ML)
引用方式：	arXiv:2509.19490 [stat.ME]
	(或者 arXiv:2509.19490v3 [stat.ME] 对于此版本)
	https://doi.org/10.48550/arXiv.2509.19490

提交历史

来自： Nathan Cheng [查看电子邮件]
[v1] 星期二， 2025 年 9 月 23 日 18:52:05 UTC (325 KB)
[v2] 星期四， 2025 年 9 月 25 日 21:19:31 UTC (325 KB)
[v3] 星期三， 2025 年 10 月 29 日 21:17:22 UTC (325 KB)

统计学 > 方法论

标题：凿刻：通过交互式机器学习进行强大且有效的子组选择

标题： Chiseling: Powerful and Valid Subgroup Selection via Interactive Machine Learning

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 方法论

标题： 凿刻：通过交互式机器学习进行强大且有效的子组选择 显示英文标题

标题： Chiseling: Powerful and Valid Subgroup Selection via Interactive Machine Learning

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：凿刻：通过交互式机器学习进行强大且有效的子组选择