Model Specification Test with Unlabeled Data: Approach from Covariate Shift

Kato, Masahiro; Kawarazaki, Hikaru

Statistics > Methodology

arXiv:1911.00688 (stat)

[Submitted on 2 Nov 2019 (v1) , last revised 23 Feb 2020 (this version, v2)]

Title: Model Specification Test with Unlabeled Data: Approach from Covariate Shift

Title: 带未标记数据的模型规范检验：来自协变量偏移的方法

Authors:Masahiro Kato, Hikaru Kawarazaki

Abstract: We propose a novel framework of the model specification test in regression using unlabeled test data. In many cases, we have conducted statistical inferences based on the assumption that we can correctly specify a model. However, it is difficult to confirm whether a model is correctly specified. To overcome this problem, existing works have devised statistical tests for model specification. Existing works have defined a correctly specified model in regression as a model with zero conditional mean of the error term over train data only. Extending the definition in conventional statistical tests, we define a correctly specified model as a model with zero conditional mean of the error term over any distribution of the explanatory variable. This definition is a natural consequence of the orthogonality of the explanatory variable and the error term. If a model does not satisfy this condition, the model might lack robustness with regards to the distribution shift. The proposed method would enable us to reject a misspecified model under our definition. By applying the proposed method, we can obtain a model that predicts the label for the unlabeled test data well without losing the interpretability of the model. In experiments, we show how the proposed method works for synthetic and real-world datasets.

Abstract: 我们提出了一种基于未标记测试数据的回归模型规范性检验的新框架。在许多情况下，我们基于能够正确指定模型的假设来进行统计推断。然而，很难确认模型是否被正确指定。为了解决这个问题，现有的工作已经设计了针对模型规范性的统计检验方法。现有工作将回归中的正确指定模型定义为训练数据上误差项条件均值为零的模型。我们扩展了传统统计检验中的定义，将正确指定模型定义为对于任意解释变量分布下误差项条件均值为零的模型。这一定义是解释变量与误差项正交性的自然结果。如果模型不满足该条件，则该模型可能缺乏对分布偏移的鲁棒性。所提出的这种方法可以使我们根据我们的定义拒绝错误指定的模型。通过应用所提出的方法，我们可以获得一个能够很好地预测未标记测试数据标签的模型，同时不会失去模型的可解释性。在实验中，我们展示了所提出的方法在合成数据集和真实世界数据集上的表现。

Comments:	The proof was wrong
Subjects:	Methodology (stat.ME) ; Econometrics (econ.EM); Machine Learning (stat.ML)
Cite as:	arXiv:1911.00688 [stat.ME]
	(or arXiv:1911.00688v2 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.1911.00688

Submission history

From: Masahiro Kato [view email]
[v1] Sat, 2 Nov 2019 10:06:17 UTC (18 KB)
[v2] Sun, 23 Feb 2020 07:42:42 UTC (1 KB)

Statistics > Methodology

Title: Model Specification Test with Unlabeled Data: Approach from Covariate Shift

Title: 带未标记数据的模型规范检验：来自协变量偏移的方法

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title: Model Specification Test with Unlabeled Data: Approach from Covariate Shift Show Chinese title

Title: 带未标记数据的模型规范检验：来自协变量偏移的方法

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Model Specification Test with Unlabeled Data: Approach from Covariate Shift