Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2504.03234

Help | Advanced Search

Computer Science > Computation and Language

arXiv:2504.03234 (cs)
[Submitted on 4 Apr 2025 (v1) , last revised 21 May 2025 (this version, v2)]

Title: Think When You Need: Self-Adaptive Chain-of-Thought Learning

Title: 思考当你需要时:自适应链式思维学习

Authors:Junjie Yang, Ke Lin, Xing Yu
Abstract: Chain of Thought (CoT) reasoning enhances language models' performance but often leads to inefficient "overthinking" on simple problems. We identify that existing approaches directly penalizing reasoning length fail to account for varying problem complexity. Our approach constructs rewards through length and quality comparisons, guided by theoretical assumptions that jointly enhance solution correctness with conciseness. Moreover, we further demonstrate our method to fuzzy tasks where ground truth is unavailable. Experiments across multiple reasoning benchmarks demonstrate that our method maintains accuracy while generating significantly more concise explanations, effectively teaching models to "think when needed."
Abstract: 链式思维(CoT)推理提高了语言模型的性能,但往往会导致在简单问题上的低效“过度思考”。 我们发现现有的直接惩罚推理长度的方法未能考虑到问题复杂性的差异。 我们的方法通过长度和质量的比较来构建奖励,以理论假设为指导,这些假设共同提升了解决方案的正确性和简洁性。 此外,我们还展示了该方法在无真实答案模糊任务中的应用。 多项推理基准实验表明,我们的方法在保持准确率的同时生成了显著更简洁的解释,有效地教会了模型在必要时“思考”。
Comments: Under review
Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2504.03234 [cs.CL]
  (or arXiv:2504.03234v2 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2504.03234
arXiv-issued DOI via DataCite

Submission history

From: Junjie Yang [view email]
[v1] Fri, 4 Apr 2025 07:34:01 UTC (2,726 KB)
[v2] Wed, 21 May 2025 15:26:54 UTC (11,895 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
cs
< prev   |   next >
new | recent | 2025-04
Change to browse by:
cs.AI
cs.CL
cs.LG

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号