FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

Xie, Zhuohan; Sahnan, Dhruv; Banerjee, Debopriyo; Georgiev, Georgi; Thareja, Rushil; Madmoun, Hachem; Su, Jinyan; Singh, Aaryamonvikram; Wang, Yuxia; Xing, Rui; Koto, Fajri; Li, Haonan; Koychev, Ivan; Chakraborty, Tanmoy; Lahlou, Salem; Stoyanov, Veselin; Nakov, Preslav

Computer Science > Computation and Language

arXiv:2506.02515 (cs)

[Submitted on 3 Jun 2025 ]

Title: FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

Title: FinChain：可验证财务推理链的符号基准

Authors:Zhuohan Xie, Dhruv Sahnan, Debopriyo Banerjee, Georgi Georgiev, Rushil Thareja, Hachem Madmoun, Jinyan Su, Aaryamonvikram Singh, Yuxia Wang, Rui Xing, Fajri Koto, Haonan Li, Ivan Koychev, Tanmoy Chakraborty, Salem Lahlou, Veselin Stoyanov, Preslav Nakov

Abstract: Multi-step symbolic reasoning is critical for advancing downstream performance on financial tasks. Yet, benchmarks for systematically evaluating this capability are lacking. Existing datasets like FinQA and ConvFinQA supervise only final numerical answers, without assessing intermediate reasoning steps. To address this, we introduce FinChain, the first symbolic benchmark designed for verifiable Chain-of- Thought (CoT) financial reasoning. Spanning 54 topics across 12 financial domains, Fin- Chain offers five parameterized templates per topic, each varying in reasoning complexity and domain expertise required. Each dataset instance includes an executable Python trace, enabling automatic generation of extensive training data and easy adaptation to other domains. We also introduce ChainEval, a new metric for automatic evaluation of both final answers and intermediate reasoning. Benchmarking 30 LLMs on our dataset, we find that even state-of-the-art models have considerable room for improvement in multi-step financial reasoning. All templates and evaluation metrics for FinChain are available at https: //github.com/mbzuai-nlp/finchain.

Abstract: 多步符号推理对于提升金融任务的下游表现至关重要。然而，缺乏系统评估这一能力的基准。现有的数据集如FinQA和ConvFinQA仅监督最终数值答案，而不评估中间推理步骤。为了解决这个问题，我们推出了FinChain，这是首个专为可验证链式思维（CoT）金融推理设计的符号基准。FinChain涵盖了12个金融领域中的54个主题，每个主题提供五个参数化模板，每个模板在推理复杂性和所需领域专业知识方面各不相同。每个数据实例包括一个可执行的Python跟踪，这使得自动生成大量训练数据和轻松适应其他领域成为可能。我们还引入了ChainEval，这是一种新的指标，用于自动评估最终答案和中间推理。通过对我们的数据集测试30个LLMs，我们发现即使是最先进的模型在多步金融推理方面仍有很大的改进空间。FinChain的所有模板和评估指标可在https://github.com/mbzuai-nlp/finchain获取。

Comments:	15 pages, 8 figures, 2 tables
Subjects:	Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2506.02515 [cs.CL]
	(or arXiv:2506.02515v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2506.02515

Submission history

From: Zhuohan Xie [view email]
[v1] Tue, 3 Jun 2025 06:44:42 UTC (731 KB)

Computer Science > Computation and Language

Title: FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

Title: FinChain：可验证财务推理链的符号基准

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title: FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning Show Chinese title

Title: FinChain：可验证财务推理链的符号基准

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning