Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2506.02515

Help | Advanced Search

Computer Science > Computation and Language

arXiv:2506.02515 (cs)
[Submitted on 3 Jun 2025 ]

Title: FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

Title: FinChain:可验证财务推理链的符号基准

Authors:Zhuohan Xie, Dhruv Sahnan, Debopriyo Banerjee, Georgi Georgiev, Rushil Thareja, Hachem Madmoun, Jinyan Su, Aaryamonvikram Singh, Yuxia Wang, Rui Xing, Fajri Koto, Haonan Li, Ivan Koychev, Tanmoy Chakraborty, Salem Lahlou, Veselin Stoyanov, Preslav Nakov
Abstract: Multi-step symbolic reasoning is critical for advancing downstream performance on financial tasks. Yet, benchmarks for systematically evaluating this capability are lacking. Existing datasets like FinQA and ConvFinQA supervise only final numerical answers, without assessing intermediate reasoning steps. To address this, we introduce FinChain, the first symbolic benchmark designed for verifiable Chain-of- Thought (CoT) financial reasoning. Spanning 54 topics across 12 financial domains, Fin- Chain offers five parameterized templates per topic, each varying in reasoning complexity and domain expertise required. Each dataset instance includes an executable Python trace, enabling automatic generation of extensive training data and easy adaptation to other domains. We also introduce ChainEval, a new metric for automatic evaluation of both final answers and intermediate reasoning. Benchmarking 30 LLMs on our dataset, we find that even state-of-the-art models have considerable room for improvement in multi-step financial reasoning. All templates and evaluation metrics for FinChain are available at https: //github.com/mbzuai-nlp/finchain.
Abstract: 多步符号推理对于提升金融任务的下游表现至关重要。然而,缺乏系统评估这一能力的基准。现有的数据集如FinQA和ConvFinQA仅监督最终数值答案,而不评估中间推理步骤。为了解决这个问题,我们推出了FinChain,这是首个专为可验证链式思维(CoT)金融推理设计的符号基准。FinChain涵盖了12个金融领域中的54个主题,每个主题提供五个参数化模板,每个模板在推理复杂性和所需领域专业知识方面各不相同。每个数据实例包括一个可执行的Python跟踪,这使得自动生成大量训练数据和轻松适应其他领域成为可能。我们还引入了ChainEval,这是一种新的指标,用于自动评估最终答案和中间推理。通过对我们的数据集测试30个LLMs,我们发现即使是最先进的模型在多步金融推理方面仍有很大的改进空间。FinChain的所有模板和评估指标可在https://github.com/mbzuai-nlp/finchain获取。
Comments: 15 pages, 8 figures, 2 tables
Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2506.02515 [cs.CL]
  (or arXiv:2506.02515v1 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2506.02515
arXiv-issued DOI via DataCite

Submission history

From: Zhuohan Xie [view email]
[v1] Tue, 3 Jun 2025 06:44:42 UTC (731 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
cs.LG
< prev   |   next >
new | recent | 2025-06
Change to browse by:
cs
cs.AI
cs.CL

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号