Statistical Inference for Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss

Guo, Zijian; Wang, Zhenyu; Hu, Yifan; Bach, Francis

统计学 > 方法论

arXiv:2507.09905 (stat)

[提交于 2025年7月14日 ]

标题：基于交叉熵损失的条件分组分布鲁棒优化的统计推断

标题： Statistical Inference for Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss

Authors:Zijian Guo, Zhenyu Wang, Yifan Hu, Francis Bach

摘要：在具有离散标签的多源学习中，领域间的分布异质性是开发能够可靠转移到未见过领域的预测模型的核心挑战。我们研究多源无监督域适应，其中标记数据来自多个源领域，而目标领域仅包含未标记数据。为应对潜在的分布变化，我们提出了一种新的条件组分布鲁棒优化（CG-DRO）框架，该框架通过最小化来自源领域的条件结果分布的凸组合中的最坏情况交叉熵损失来学习分类器。为解决由此产生的极小极大问题，我们开发了一种高效的镜像近似算法，其中我们采用双重机器学习过程来估计风险函数。这确保了对干扰模型的机器学习估计器的误差仅以更高阶率进入，从而在协变量变化下保持统计效率。我们通过构建两个替代极小极大优化问题来建立估计量的快速统计收敛速率，这些问题作为理论桥梁。 CG-DRO的一个显著挑战是出现非标准渐近：由于边界效应和系统不稳定，经验估计量可能无法收敛到标准极限分布。为解决此问题，我们引入了一种基于扰动的推断程序，该程序能够实现统一有效的推断，包括置信区间构建和假设检验。

摘要： In multi-source learning with discrete labels, distributional heterogeneity across domains poses a central challenge to developing predictive models that transfer reliably to unseen domains. We study multi-source unsupervised domain adaptation, where labeled data are drawn from multiple source domains and only unlabeled data from a target domain. To address potential distribution shifts, we propose a novel Conditional Group Distributionally Robust Optimization (CG-DRO) framework that learns a classifier by minimizing the worst-case cross-entropy loss over the convex combinations of the conditional outcome distributions from the sources. To solve the resulting minimax problem, we develop an efficient Mirror Prox algorithm, where we employ a double machine learning procedure to estimate the risk function. This ensures that the errors of the machine learning estimators for the nuisance models enter only at higher-order rates, thereby preserving statistical efficiency under covariate shift. We establish fast statistical convergence rates for the estimator by constructing two surrogate minimax optimization problems that serve as theoretical bridges. A distinguishing challenge for CG-DRO is the emergence of nonstandard asymptotics: the empirical estimator may fail to converge to a standard limiting distribution due to boundary effects and system instability. To address this, we introduce a perturbation-based inference procedure that enables uniformly valid inference, including confidence interval construction and hypothesis testing.

主题：	方法论 (stat.ME) ; 统计理论 (math.ST); 机器学习 (stat.ML)
引用方式：	arXiv:2507.09905 [stat.ME]
	(或者 arXiv:2507.09905v1 [stat.ME] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.09905

提交历史

来自： Zhenyu Wang [查看电子邮件]
[v1] 星期一， 2025 年 7 月 14 日 04:21:23 UTC (917 KB)

统计学 > 方法论

标题：基于交叉熵损失的条件分组分布鲁棒优化的统计推断

标题： Statistical Inference for Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 方法论

标题： 基于交叉熵损失的条件分组分布鲁棒优化的统计推断 显示英文标题

标题： Statistical Inference for Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于交叉熵损失的条件分组分布鲁棒优化的统计推断