Breaking the Myth: Can Small Models Infer Postconditions Too?

Zhang, Gehao; Wang, Zhenting; Zhai, Juan

计算机科学 > 软件工程

arXiv:2507.10182 (cs)

[提交于 2025年7月14日 ]

标题：打破神话：小型模型也能推断后置条件吗？

标题： Breaking the Myth: Can Small Models Infer Postconditions Too?

Authors:Gehao Zhang, Zhenting Wang, Juan Zhai

摘要：形式化规范对于确保软件正确性至关重要，但手动编写它们既繁琐又容易出错。大型语言模型（LLMs）在从自然语言意图生成此类规范方面显示出潜力，但巨大的模型规模和高计算需求引发了一个根本性问题：我们真的需要大模型来完成这项任务吗？在本文中，我们表明，一个小型微调的语言模型可以在计算成本大大降低的情况下实现高质量的后置条件生成。我们构建了一个包含提示、推理日志和后置条件的专业数据集，然后对一个$7$B参数的代码模型进行监督微调。我们的方法解决了现实世界仓库依赖关系并保留了预状态信息，从而实现了表达性强且准确的规范。我们在一个现实世界的Java错误基准（Defects4J）上评估了该模型，并与专有大模型（例如GPT-4o）和开源大模型进行了比较。实证结果表明，我们的紧凑模型在语法正确性、语义正确性和错误区分能力方面可以与更大模型相媲美或显著优于它们。这些发现表明，在适度数据集上的针对性微调可以使小型模型实现以前仅在大规模、资源密集型LLM中才能看到的结果，为自动化规范生成在现实世界中的采用提供了一种实用且高效的路径。

摘要： Formal specifications are essential for ensuring software correctness, yet manually writing them is tedious and error-prone. Large Language Models (LLMs) have shown promise in generating such specifications from natural language intents, but the giant model size and high computational demands raise a fundamental question: Do we really need large models for this task? In this paper, we show that a small, fine-tuned language model can achieve high-quality postcondition generation with much lower computational costs. We construct a specialized dataset of prompts, reasoning logs, and postconditions, then supervise the fine-tuning of a $7$B-parameter code model. Our approach tackles real-world repository dependencies and preserves pre-state information, allowing for expressive and accurate specifications. We evaluate the model on a benchmark of real-world Java bugs (Defects4J) and compare against both proprietary giants (e.g., GPT-4o) and open-source large models. Empirical results demonstrate that our compact model matches or outperforms significantly larger counterparts in syntax correctness, semantic correctness, and bug-distinguishing capability. These findings highlight that targeted fine-tuning on a modest dataset can enable small models to achieve results formerly seen only in massive, resource-heavy LLMs, offering a practical and efficient path for the real-world adoption of automated specification generation.

主题：	软件工程 (cs.SE) ; 人工智能 (cs.AI)
引用方式：	arXiv:2507.10182 [cs.SE]
	(或者 arXiv:2507.10182v1 [cs.SE] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.10182

提交历史

来自： Gehao Zhang [查看电子邮件]
[v1] 星期一， 2025 年 7 月 14 日 11:44:04 UTC (1,074 KB)

计算机科学 > 软件工程

标题：打破神话：小型模型也能推断后置条件吗？

标题： Breaking the Myth: Can Small Models Infer Postconditions Too?

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 软件工程

标题： 打破神话：小型模型也能推断后置条件吗？ 显示英文标题

标题： Breaking the Myth: Can Small Models Infer Postconditions Too?

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：打破神话：小型模型也能推断后置条件吗？