Contextual Code Retrieval for Commit Message Generation: A Preliminary Study

Xiong, Bo; Zhang, Linghao; Wang, Chong; Liang, Peng

计算机科学 > 软件工程

arXiv:2507.17690 (cs)

[提交于 2025年7月23日 ]

标题：上下文代码检索用于提交消息生成：一项初步研究

标题： Contextual Code Retrieval for Commit Message Generation: A Preliminary Study

Authors:Bo Xiong, Linghao Zhang, Chong Wang, Peng Liang

摘要：提交信息描述了提交中的主要代码更改，并在软件维护中起着关键作用。现有的提交信息生成（CMG）方法通常将其视为一种直接映射，输入代码差异并输出一个简短的描述性句子作为输出。然而，我们认为仅依赖代码差异是不够的，因为原始代码差异无法捕捉生成高质量和信息丰富的提交信息所需的完整上下文。在本文中，我们提出了一种基于上下文代码检索的方法，称为C3Gen，通过从仓库中检索与提交相关的代码片段并将其纳入模型输入，以在仓库范围内提供更丰富的上下文信息，从而增强CMG。在实验中，我们使用四个客观指标和三个主观指标评估了C3Gen在各种模型上的有效性。同时，我们设计并进行了一项人工评估，以研究C3Gen生成的提交信息如何被人类开发者所感知。结果表明，通过将上下文代码纳入输入，C3Gen使模型能够有效利用额外信息，生成更具全面性和信息量的提交信息，在实际开发场景中具有更大的实用价值。进一步的分析强调了对基于相似性的度量标准可靠性的担忧，并为CMG提供了实证见解。

摘要： A commit message describes the main code changes in a commit and plays a crucial role in software maintenance. Existing commit message generation (CMG) approaches typically frame it as a direct mapping which inputs a code diff and produces a brief descriptive sentence as output. However, we argue that relying solely on the code diff is insufficient, as raw code diff fails to capture the full context needed for generating high-quality and informative commit messages. In this paper, we propose a contextual code retrieval-based method called C3Gen to enhance CMG by retrieving commit-relevant code snippets from the repository and incorporating them into the model input to provide richer contextual information at the repository scope. In the experiments, we evaluated the effectiveness of C3Gen across various models using four objective and three subjective metrics. Meanwhile, we design and conduct a human evaluation to investigate how C3Gen-generated commit messages are perceived by human developers. The results show that by incorporating contextual code into the input, C3Gen enables models to effectively leverage additional information to generate more comprehensive and informative commit messages with greater practical value in real-world development scenarios. Further analysis underscores concerns about the reliability of similaritybased metrics and provides empirical insights for CMG.

评论：	第19届ACM/IEEE国际经验软件工程与测量研讨会（ESEM）
主题：	软件工程 (cs.SE)
引用方式：	arXiv:2507.17690 [cs.SE]
	(或者 arXiv:2507.17690v1 [cs.SE] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.17690

提交历史

来自： Peng Liang [查看电子邮件]
[v1] 星期三， 2025 年 7 月 23 日 16:54:57 UTC (301 KB)

计算机科学 > 软件工程

标题：上下文代码检索用于提交消息生成：一项初步研究

标题： Contextual Code Retrieval for Commit Message Generation: A Preliminary Study

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 软件工程

标题： 上下文代码检索用于提交消息生成：一项初步研究 显示英文标题

标题： Contextual Code Retrieval for Commit Message Generation: A Preliminary Study

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：上下文代码检索用于提交消息生成：一项初步研究