CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning

Niu, Ke; Chen, Zhuofan; Yu, Haiyang; Chen, Yuwen; Fu, Teng; Zhao, Mengyang; Li, Bin; Xue, Xiangyang

计算机科学 > 计算机视觉与模式识别

arXiv:2506.00568v1 (cs)

[提交于 2025年5月31日 ]

标题： CReFT-CAD：通过强化微调提升CAD的正投影推理能力

标题： CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning

Authors:Ke Niu, Zhuofan Chen, Haiyang Yu, Yuwen Chen, Teng Fu, Mengyang Zhao, Bin Li, Xiangyang Xue

摘要：计算机辅助设计（CAD）在工业制造中起着至关重要的作用。正交投影推理贯穿整个CAD工作流程，涵盖设计、制造和仿真。然而，现有的深度学习方法通常采用标准的三维重建管道作为替代方案，这往往会导致尺寸不精确，并限制了CAD工作流程所需的参数可编辑性。最近，一些研究人员采用视觉-语言模型（VLM），特别是监督微调（SFT），来解决与CAD相关的问题。虽然SFT显示出潜力，但常常退化为模式记忆，在复杂的推理任务上表现不佳。为了解决这些差距，我们提出了CReFT-CAD，这是一种两阶段的微调范式，首先采用以难度感知奖励为驱动的课程强化学习阶段，逐步建立推理能力，然后应用监督后微调来提高指令遵循和语义提取能力。此外，我们发布了TriView2CAD，这是首个大规模开源基准数据集，用于正交投影推理，包含200,000个合成和3,000个真实世界的正交投影，具有精确的尺寸标注和六种互操作的数据模态。我们在正交投影推理上评估了领先的VLM，并证明CReFT-CAD在实际场景中显著提高了推理准确性和分布外泛化能力，为推进CAD推理研究提供了有价值的见解。

摘要： Computer-Aided Design (CAD) plays a pivotal role in industrial manufacturing. Orthographic projection reasoning underpins the entire CAD workflow, encompassing design, manufacturing, and simulation. However, prevailing deep-learning approaches employ standard 3D reconstruction pipelines as an alternative, which often introduce imprecise dimensions and limit the parametric editability required for CAD workflows. Recently, some researchers adopt vision-language models (VLMs), particularly supervised fine-tuning (SFT), to tackle CAD-related challenges. SFT shows promise but often devolves into pattern memorization, yielding poor out-of-distribution performance on complex reasoning tasks. To address these gaps, we introduce CReFT-CAD, a two-stage fine-tuning paradigm that first employs a curriculum-driven reinforcement learning stage with difficulty-aware rewards to build reasoning ability steadily, and then applies supervised post-tuning to hone instruction following and semantic extraction. Complementing this, we release TriView2CAD, the first large-scale, open-source benchmark for orthographic projection reasoning, comprising 200,000 synthetic and 3,000 real-world orthographic projections with precise dimension annotations and six interoperable data modalities. We benchmark leading VLMs on orthographic projection reasoning and demonstrate that CReFT-CAD substantially improves reasoning accuracy and out-of-distribution generalizability in real-world scenarios, offering valuable insights for advancing CAD reasoning research.

主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2506.00568 [cs.CV]
	(或者 arXiv:2506.00568v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.00568

提交历史

来自： Ke Niu [查看电子邮件]
[v1] 星期六， 2025 年 5 月 31 日 13:52:56 UTC (5,104 KB)

计算机科学 > 计算机视觉与模式识别

标题： CReFT-CAD：通过强化微调提升CAD的正投影推理能力

标题： CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： CReFT-CAD：通过强化微调提升CAD的正投影推理能力 显示英文标题

标题： CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： CReFT-CAD：通过强化微调提升CAD的正投影推理能力