MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization

Naik, Atharva; Baghel, Lawanya; Govindarajan, Dhakshin; Agrawal, Darsh; Fried, Daniel; Rose, Carolyn

计算机科学 > 软件工程

arXiv:2507.11687 (cs)

[提交于 2025年7月15日 ]

标题： MetaLint：通过遵循指令和易于到难的泛化进行可泛化的惯用代码质量分析

标题： MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization

Authors:Atharva Naik, Lawanya Baghel, Dhakshin Govindarajan, Darsh Agrawal, Daniel Fried, Carolyn Rose

摘要：大型语言模型尽管在代码生成方面取得成功，但在代码质量分析方面存在困难，因为它们受限于静态训练数据，难以适应不断变化的最佳实践。我们引入了MetaLint，这是一种新的遵循指令的框架，将代码质量分析建模为基于高层规范检测和修复有问题的语义代码片段或代码习惯的任务。与传统方法在静态、基于规则的数据上训练模型不同，MetaLint在合成的linter生成数据上进行指令微调，以支持从易到难的泛化，使模型能够在不重新训练的情况下适应新或复杂的代码模式。为了评估这一点，我们构建了一个受现实世界编码标准（如Python增强提案PEPs）启发的具有挑战性的习惯基准，并评估MetaLint训练的模型是否能自适应推理或只是记忆。我们的结果表明，MetaLint提高了对未见过的PEP习惯的泛化能力，在习惯检测中达到了70.37%的F分数，所有评估模型中召回率最高（70.43%）。它在定位任务中也达到了26.73%，对于其4B参数规模来说具有竞争力，并且与更大的最先进的模型如o3-mini相当，突显了其在未来代码质量分析中的潜力。

摘要： Large Language Models, though successful in code generation, struggle with code quality analysis because they are limited by static training data and can't easily adapt to evolving best practices. We introduce MetaLint, a new instruction-following framework that formulates code quality analysis as the task of detecting and fixing problematic semantic code fragments or code idioms based on high-level specifications. Unlike conventional approaches that train models on static, rule-based data, MetaLint employs instruction tuning on synthetic linter-generated data to support easy-to-hard generalization, enabling models to adapt to novel or complex code patterns without retraining. To evaluate this, we construct a benchmark of challenging idioms inspired by real-world coding standards such as Python Enhancement Proposals (PEPs) and assess whether MetaLint-trained models reason adaptively or simply memorize. Our results show that MetaLint improves generalization to unseen PEP idioms, achieving a 70.37% F-score on idiom detection with the highest recall (70.43%) among all evaluated models. It also achieves 26.73% on localization, competitive for its 4B parameter size and comparable to larger state-of-the-art models like o3-mini, highlighting its potential for future-proof code quality analysis.

主题：	软件工程 (cs.SE) ; 计算与语言 (cs.CL); 机器学习 (cs.LG)
引用方式：	arXiv:2507.11687 [cs.SE]
	(或者 arXiv:2507.11687v1 [cs.SE] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.11687

提交历史

来自： Atharva Naik [查看电子邮件]
[v1] 星期二， 2025 年 7 月 15 日 19:44:20 UTC (1,610 KB)

计算机科学 > 软件工程

标题： MetaLint：通过遵循指令和易于到难的泛化进行可泛化的惯用代码质量分析

标题： MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 软件工程

标题： MetaLint：通过遵循指令和易于到难的泛化进行可泛化的惯用代码质量分析 显示英文标题

标题： MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： MetaLint：通过遵循指令和易于到难的泛化进行可泛化的惯用代码质量分析