LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction

Rubungo, Andre Niyongabo; Li, Kangming; Hattrick-Simpers, Jason; Dieng, Adji Bousso

凝聚态物理 > 材料科学

arXiv:2411.00177v3 (cond-mat)

[提交于 2024年10月31日 (v1) ，最后修订 2024年11月30日 (此版本， v3)]

标题： LLM4Mat-Bench：材料性质预测的大语言模型基准测试

标题： LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction

Authors:Andre Niyongabo Rubungo, Kangming Li, Jason Hattrick-Simpers, Adji Bousso Dieng

摘要：大型语言模型（LLMs）在材料科学中被越来越多地使用。然而，对于基于LLM的材料性质预测的基准测试和标准化评估关注很少，这阻碍了进展。我们提出了LLM4Mat-Bench，这是目前最大的基准测试，用于评估LLM在预测晶体材料性质方面的性能。 LLM4Mat-Bench总共包含约190万种晶体结构，来自10个公开的材料数据源，并包含45种不同的性质。 LLM4Mat-Bench具有不同的输入模态：晶体组成、CIF和晶体文本描述，每种模态的总令牌数分别为470万、61550万和310亿。我们使用LLM4Mat-Bench对不同规模的模型进行微调，包括LLM-Prop和MatBERT，并提供零样本和少量样本提示来评估类似LLM-chat的模型的性质预测能力，包括Llama、Gemma和Mistral。结果突显了通用LLM在材料科学中的挑战，以及在材料性质预测中需要任务特定的预测模型和任务特定的指令调优LLM。

摘要： Large language models (LLMs) are increasingly being used in materials science. However, little attention has been given to benchmarking and standardized evaluation for LLM-based materials property prediction, which hinders progress. We present LLM4Mat-Bench, the largest benchmark to date for evaluating the performance of LLMs in predicting the properties of crystalline materials. LLM4Mat-Bench contains about 1.9M crystal structures in total, collected from 10 publicly available materials data sources, and 45 distinct properties. LLM4Mat-Bench features different input modalities: crystal composition, CIF, and crystal text description, with 4.7M, 615.5M, and 3.1B tokens in total for each modality, respectively. We use LLM4Mat-Bench to fine-tune models with different sizes, including LLM-Prop and MatBERT, and provide zero-shot and few-shot prompts to evaluate the property prediction capabilities of LLM-chat-like models, including Llama, Gemma, and Mistral. The results highlight the challenges of general-purpose LLMs in materials science and the need for task-specific predictive models and task-specific instruction-tuned LLMs in materials property prediction.

评论：	已被NeurIPS 2024-AI4Mat研讨会接收。基准测试和代码可在https://github.com/vertaix/LLM4Mat-Bench找到。
主题：	材料科学 (cond-mat.mtrl-sci) ; 计算与语言 (cs.CL)
引用方式：	arXiv:2411.00177 [cond-mat.mtrl-sci]
	(或者 arXiv:2411.00177v3 [cond-mat.mtrl-sci] 对于此版本)
	https://doi.org/10.48550/arXiv.2411.00177

提交历史

来自： Andre Niyongabo Rubungo [查看电子邮件]
[v1] 星期四， 2024 年 10 月 31 日 19:48:12 UTC (311 KB)
[v2] 星期五， 2024 年 11 月 8 日 16:42:18 UTC (316 KB)
[v3] 星期六， 2024 年 11 月 30 日 14:01:56 UTC (323 KB)

凝聚态物理 > 材料科学

标题： LLM4Mat-Bench：材料性质预测的大语言模型基准测试

标题： LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

凝聚态物理 > 材料科学

标题： LLM4Mat-Bench：材料性质预测的大语言模型基准测试 显示英文标题

标题： LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： LLM4Mat-Bench：材料性质预测的大语言模型基准测试