Debunking Generalization Error or: How I Learned to Stop Worrying and Love My Training Set

Acquaviva, Viviana; Lovell, Chistopher; Ishida, Emille

天体物理学 > 天体物理学的仪器与方法

arXiv:2012.00066 (astro-ph)

[提交于 2020年11月30日 ]

标题：破解泛化误差或：我如何学会停止担忧并爱上我的训练集

标题： Debunking Generalization Error or: How I Learned to Stop Worrying and Love My Training Set

Authors:Viviana Acquaviva, Chistopher Lovell, Emille Ishida

摘要：我们旨在通过观察到的光谱确定遥远星系的一些物理性质（例如，恒星质量、恒星形成历史或化学富集历史），采用监督机器学习方法。我们知道不同的天体物理过程会在光谱的不同区域留下特征签名。不幸的是，为这个问题识别训练集非常困难，因为标签不易获得——我们无法知道星系真正形成的历史。解决此问题的一种可能方法是在最先进的宇宙学模拟上训练机器学习模型。然而，当算法在模拟数据上进行训练时，一旦应用于真实数据，它们的表现如何则不清楚。在本文中，我们尝试将泛化误差建模为源域和应用域之间适当距离度量的函数。我们的目标是获得对在模拟数据上训练的模型在实际数据上的表现的可靠估计。

摘要： We aim to determine some physical properties of distant galaxies (for example, stellar mass, star formation history, or chemical enrichment history) from their observed spectra, using supervised machine learning methods. We know that different astrophysical processes leave their imprint in various regions of the spectra with characteristic signatures. Unfortunately, identifying a training set for this problem is very hard, because labels are not readily available - we have no way of knowing the true history of how galaxies have formed. One possible approach to this problem is to train machine learning models on state-of-the-art cosmological simulations. However, when algorithms are trained on the simulations, it is unclear how well they will perform once applied to real data. In this paper, we attempt to model the generalization error as a function of an appropriate measure of distance between the source domain and the application domain. Our goal is to obtain a reliable estimate of how a model trained on simulations might behave on data.

评论：	适用于2020年NeurIPS研讨会“机器学习与物理科学”；欢迎评论！
主题：	天体物理学的仪器与方法 (astro-ph.IM) ; 星系的天体物理学 (astro-ph.GA)
引用方式：	arXiv:2012.00066 [astro-ph.IM]
	(或者 arXiv:2012.00066v1 [astro-ph.IM] 对于此版本)
	https://doi.org/10.48550/arXiv.2012.00066

提交历史

来自： Viviana Acquaviva [查看电子邮件]
[v1] 星期一， 2020 年 11 月 30 日 19:35:49 UTC (781 KB)

天体物理学 > 天体物理学的仪器与方法

标题：破解泛化误差或：我如何学会停止担忧并爱上我的训练集

标题： Debunking Generalization Error or: How I Learned to Stop Worrying and Love My Training Set

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

天体物理学 > 天体物理学的仪器与方法

标题： 破解泛化误差或：我如何学会停止担忧并爱上我的训练集 显示英文标题

标题： Debunking Generalization Error or: How I Learned to Stop Worrying and Love My Training Set

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：破解泛化误差或：我如何学会停止担忧并爱上我的训练集