Climate-Eval: A Comprehensive Benchmark for NLP Tasks Related to Climate Change

Kurfalı, Murathan; Zahra, Shorouq; Nivre, Joakim; Messori, Gabriele

计算机科学 > 计算与语言

arXiv:2505.18653 (cs)

[提交于 2025年5月24日 ]

标题：气候评估：气候变化相关自然语言处理任务的全面基准

标题： Climate-Eval: A Comprehensive Benchmark for NLP Tasks Related to Climate Change

Authors:Murathan Kurfalı, Shorouq Zahra, Joakim Nivre, Gabriele Messori

摘要：气候评估是一个全面的基准，旨在评估自然语言处理模型在与气候变化相关的广泛任务上的表现。气候评估整合了现有的数据集以及一个新开发的新闻分类数据集，该数据集专门为此发布而创建。这导致了一个基于13个数据集的25项任务的基准，涵盖了气候变化讨论的关键方面，包括文本分类、问答和信息提取。我们的基准为系统评估大型语言模型（LLMs）在这些任务上的表现提供了标准化的评估套件。此外，我们在零样本和少样本设置下对开源LLMs（参数范围从2B到70B）进行了广泛的评估，分析它们在气候变化领域的优势和局限性。

摘要： Climate-Eval is a comprehensive benchmark designed to evaluate natural language processing models across a broad range of tasks related to climate change. Climate-Eval aggregates existing datasets along with a newly developed news classification dataset, created specifically for this release. This results in a benchmark of 25 tasks based on 13 datasets, covering key aspects of climate discourse, including text classification, question answering, and information extraction. Our benchmark provides a standardized evaluation suite for systematically assessing the performance of large language models (LLMs) on these tasks. Additionally, we conduct an extensive evaluation of open-source LLMs (ranging from 2B to 70B parameters) in both zero-shot and few-shot settings, analyzing their strengths and limitations in the domain of climate change.

评论：	被ClimateNLP 2025@ACL接收
主题：	计算与语言 (cs.CL)
引用方式：	arXiv:2505.18653 [cs.CL]
	(或者 arXiv:2505.18653v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2505.18653

提交历史

来自： Murathan Kurfalı [查看电子邮件]
[v1] 星期六， 2025 年 5 月 24 日 11:45:46 UTC (1,431 KB)

计算机科学 > 计算与语言

标题：气候评估：气候变化相关自然语言处理任务的全面基准

标题： Climate-Eval: A Comprehensive Benchmark for NLP Tasks Related to Climate Change

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： 气候评估：气候变化相关自然语言处理任务的全面基准 显示英文标题

标题： Climate-Eval: A Comprehensive Benchmark for NLP Tasks Related to Climate Change

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：气候评估：气候变化相关自然语言处理任务的全面基准