Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification

Li, Yu-Yang; Bai, Yu; Wang, Cunshi; Qu, Mengwei; Lu, Ziteng; Soria, Roberto; Liu, Jifeng

doi:10.34133/icomputing.0110

天体物理学 > 天体物理学的仪器与方法

arXiv:2404.10757 (astro-ph)

[提交于 2024年4月16日 (v1) ，最后修订 2025年2月24日 (此版本， v2)]

标题：基于深度学习和LLM的方法应用于恒星光变曲线分类

标题： Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification

Authors:Yu-Yang Li, Yu Bai, Cunshi Wang, Mengwei Qu, Ziteng Lu, Roberto Soria, Jifeng Liu

摘要：光曲线是研究恒星形成和演化的宝贵信息来源。随着机器学习技术的快速发展，它可以被有效处理以提取天文模式和信息。在这项研究中，我们基于开普勒和K2任务的大数据集，提出了深度学习和大型语言模型（LLM）基模型在变星光曲线自动分类中的综合评估。特别强调造父变星、RR lyrae和掩食双星，探讨观测周期和相位分布对分类精度的影响。通过AutoDL优化，我们实现了令人瞩目的性能，使用1D卷积+双向LSTM架构和Swin Transformer，分别达到94%和99%的准确率，后者在分辨仅占总数据集0.02%的难以捉摸的II型造父变星时表现出显著的83%准确率。我们揭示了StarWhisper LightCurve（LC），这是一个由三个基于LLM的模型组成的创新系列：LLM、多模态大型语言模型（MLLM）和大型音频语言模型（LALM）。每个模型都经过战略性提示工程和定制化训练方法微调，以探索这些模型在天文学数据中的新兴能力。值得注意的是，StarWhisper LC系列表现出约90%的高准确率，大大减少了显式特征工程的需求，从而为天文学应用中流线型并行数据处理和多模态模型的发展铺平了道路。该研究提供了两个详细的目录，展示了相位和采样间隔对深度学习分类准确性的影响，表明观察时间缩短高达14%，采样点减少21%，而不降低超过10%的准确性。

摘要： Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, it can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of deep-learning and large language model (LLM) based models for the automatic classification of variable star light curves, based on large datasets from the Kepler and K2 missions. Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries, examining the influence of observational cadence and phase distribution on classification precision. Employing AutoDL optimization, we achieve striking performance with the 1D-Convolution+BiLSTM architecture and the Swin Transformer, hitting accuracies of 94\% and 99\% correspondingly, with the latter demonstrating a notable 83\% accuracy in discerning the elusive Type II Cepheids-comprising merely 0.02\% of the total dataset.We unveil StarWhisper LightCurve (LC), an innovative Series comprising three LLM-based models: LLM, multimodal large language model (MLLM), and Large Audio Language Model (LALM). Each model is fine-tuned with strategic prompt engineering and customized training methods to explore the emergent abilities of these models for astronomical data. Remarkably, StarWhisper LC Series exhibit high accuracies around 90\%, significantly reducing the need for explicit feature engineering, thereby paving the way for streamlined parallel data processing and the progression of multifaceted multimodal models in astronomical applications. The study furnishes two detailed catalogs illustrating the impacts of phase and sampling intervals on deep learning classification accuracy, showing that a substantial decrease of up to 14\% in observation duration and 21\% in sampling points can be realized without compromising accuracy by more than 10\%.

评论：	35页，20幅图
主题：	天体物理学的仪器与方法 (astro-ph.IM) ; 太阳与恒星天体物理学 (astro-ph.SR); 计算与语言 (cs.CL); 机器学习 (cs.LG)
引用方式：	arXiv:2404.10757 [astro-ph.IM]
	(或者 arXiv:2404.10757v2 [astro-ph.IM] 对于此版本)
	https://doi.org/10.48550/arXiv.2404.10757
期刊参考：	Intell Comput. 2025;4:0110
相关 DOI:	https://doi.org/10.34133/icomputing.0110

提交历史

来自： Yuyang Li [查看电子邮件]
[v1] 星期二， 2024 年 4 月 16 日 17:35:25 UTC (14,336 KB)
[v2] 星期一， 2025 年 2 月 24 日 00:25:01 UTC (10,283 KB)

天体物理学 > 天体物理学的仪器与方法

标题：基于深度学习和LLM的方法应用于恒星光变曲线分类

标题： Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

天体物理学 > 天体物理学的仪器与方法

标题： 基于深度学习和LLM的方法应用于恒星光变曲线分类 显示英文标题

标题： Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于深度学习和LLM的方法应用于恒星光变曲线分类