Deep Learning-based Code Completion: On the Impact on Performance of Contextual Information

Ciniselli, Matteo; Pascarella, Luca; Bavota, Gabriele

计算机科学 > 软件工程

arXiv:2501.05062 (cs)

[提交于 2025年1月9日 ]

标题：基于深度学习的代码补全：上下文信息对性能的影响

标题： Deep Learning-based Code Completion: On the Impact on Performance of Contextual Information

Authors:Matteo Ciniselli, Luca Pascarella, Gabriele Bavota

摘要：代码补全旨在通过向开发人员推荐他们可能输入的下一个标记来加快代码编写速度。深度学习（DL）模型通过重新定义这些代码助手能够完成的任务，推动了代码补全的边界：我们从预测少量代码标记转变为自动生成整个函数。影响基于DL的代码补全技术性能的一个重要因素是作为输入提供的上下文。我们所说的“上下文”指的是模型对要完成的代码所了解的内容。在一个简单的场景中，DL模型可能会被提供一个部分实现的函数以完成。在这种情况下，上下文由不完整的函数表示，基于此，模型必须生成一个预测。然而，也可以扩展这种上下文以包含其他信息，例如包含要完成函数的整个源代码文件，这可能有助于提高预测性能。在本研究中，我们进行了一项实证研究，调查基于DL的代码补全技术的性能如何受到不同上下文的影响。我们测试了8种上下文及其组合。这些上下文包括：(i) 编码上下文，包含从调用代码补全的代码库中提取的信息（例如，与要完成的代码组件在结构上相关的代码组件）；(ii) 过程上下文，包含旨在描绘代码补全任务触发时项目当前状态的信息（例如，与要完成的代码相关的开放问题的文本表示）；以及 (iii) 开发者上下文，捕捉调用代码补全的开发者的信息（例如，常用的API）。我们的结果表明，额外的上下文信息可以提高基于DL的代码补全的性能，在正确预测方面相对提升高达+22%。

摘要： Code completion aims at speeding up code writing by recommending to developers the next tokens they are likely to type. Deep Learning (DL) models pushed the boundaries of code completion by redefining what these coding assistants can do: We moved from predicting few code tokens to automatically generating entire functions. One important factor impacting the performance of DL-based code completion techniques is the context provided as input. With "context" we refer to what the model knows about the code to complete. In a simple scenario, the DL model might be fed with a partially implemented function to complete. In this case, the context is represented by the incomplete function and, based on it, the model must generate a prediction. It is however possible to expand such a context to include additional information, like the whole source code file containing the function to complete, which could be useful to boost the prediction performance. In this work, we present an empirical study investigating how the performance of a DL-based code completion technique is affected by different contexts. We experiment with 8 types of contexts and their combinations. These contexts include: (i) coding contexts, featuring information extracted from the code base in which the code completion is invoked (e.g., code components structurally related to the one to "complete"); (ii) process context, with information aimed at depicting the current status of the project in which a code completion task is triggered (e.g., a textual representation of open issues relevant for the code to complete); and (iii) developer contexts, capturing information about the developer invoking the code completion (e.g., the APIs frequently used). Our results show that additional contextual information can benefit the performance of DL-based code completion, with relative improvements up to +22% in terms of correct predictions.

主题：	软件工程 (cs.SE)
引用方式：	arXiv:2501.05062 [cs.SE]
	(或者 arXiv:2501.05062v1 [cs.SE] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.05062

提交历史

来自： Matteo Ciniselli [查看电子邮件]
[v1] 星期四， 2025 年 1 月 9 日 08:34:34 UTC (313 KB)

计算机科学 > 软件工程

标题：基于深度学习的代码补全：上下文信息对性能的影响

标题： Deep Learning-based Code Completion: On the Impact on Performance of Contextual Information

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 软件工程

标题： 基于深度学习的代码补全：上下文信息对性能的影响 显示英文标题

标题： Deep Learning-based Code Completion: On the Impact on Performance of Contextual Information

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于深度学习的代码补全：上下文信息对性能的影响