Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

Khan, Muhammad Fawad Akbar; Ramsdell, Max; Falor, Erik; Karimi, Hamid

Computer Science > Software Engineering

arXiv:2311.02640 (cs)

[Submitted on 5 Nov 2023 ]

Title: Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

Title: 评估ChatGPT在自动代码生成中的潜力与局限性

Authors:Muhammad Fawad Akbar Khan, Max Ramsdell, Erik Falor, Hamid Karimi

Abstract: This paper presents a comprehensive evaluation of the code generation capabilities of ChatGPT, a prominent large language model, compared to human programmers. A novel dataset of 131 code-generation prompts across 5 categories was curated to enable robust analysis. Code solutions were generated by both ChatGPT and humans for all prompts, resulting in 262 code samples. A meticulous manual assessment methodology prioritized evaluating correctness, comprehensibility, and security using 14 established code quality metrics. The key findings reveal ChatGPT's strengths in crafting concise, efficient code with advanced constructs, showcasing strengths in data analysis tasks (93.1% accuracy) but limitations in visual-graphical challenges. Comparative analysis with human code highlights ChatGPT's inclination towards modular design and superior error handling. Additionally, machine learning models effectively distinguished ChatGPT from human code with up to 88% accuracy, suggesting detectable coding style disparities. By providing profound insights into ChatGPT's code generation capabilities and limitations through quantitative metrics and qualitative analysis, this study makes valuable contributions toward advancing AI-based programming assistants. The curated dataset and methodology offer a robust foundation for future research in this nascent domain. All data and codes are available on https://github.com/DSAatUSU/ChatGPT-promises-and-pitfalls.

Abstract: 本文对ChatGPT这一著名的大语言模型的代码生成能力进行了全面评估，并与人类程序员进行了比较。整理了一个包含5个类别共131个代码生成提示的新数据集，以实现稳健的分析。所有提示的代码解决方案均由ChatGPT和人类生成，共计262个代码样本。采用了一种细致的手动评估方法，优先使用14个已建立的代码质量指标来评估正确性、可理解性和安全性。主要发现表明，ChatGPT在编写简洁高效的代码方面具有优势，使用了高级结构，在数据分析任务中表现出色（93.1%的准确率），但在视觉图形挑战方面存在局限性。与人类代码的比较分析表明，ChatGPT倾向于模块化设计并具有更优的错误处理能力。此外，机器学习模型能够以高达88%的准确率区分ChatGPT代码和人类代码，表明编码风格存在可检测的差异。通过定量指标和定性分析，本研究深入揭示了ChatGPT的代码生成能力和局限性，为推进基于人工智能的编程助手做出了有价值的贡献。整理的数据集和方法为该新兴领域未来的研究提供了坚实的基础。所有数据和代码均可在https://github.com/DSAatUSU/ChatGPT-promises-and-pitfalls上获取。

Subjects:	Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Cite as:	arXiv:2311.02640 [cs.SE]
	(or arXiv:2311.02640v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2311.02640

Submission history

From: Hamid Karimi [view email]
[v1] Sun, 5 Nov 2023 12:56:40 UTC (11,250 KB)

Computer Science > Software Engineering

Title: Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

Title: 评估ChatGPT在自动代码生成中的潜力与局限性

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title: Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation Show Chinese title

Title: 评估ChatGPT在自动代码生成中的潜力与局限性

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation