Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2311.02640

Help | Advanced Search

Computer Science > Software Engineering

arXiv:2311.02640 (cs)
[Submitted on 5 Nov 2023 ]

Title: Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

Title: 评估ChatGPT在自动代码生成中的潜力与局限性

Authors:Muhammad Fawad Akbar Khan, Max Ramsdell, Erik Falor, Hamid Karimi
Abstract: This paper presents a comprehensive evaluation of the code generation capabilities of ChatGPT, a prominent large language model, compared to human programmers. A novel dataset of 131 code-generation prompts across 5 categories was curated to enable robust analysis. Code solutions were generated by both ChatGPT and humans for all prompts, resulting in 262 code samples. A meticulous manual assessment methodology prioritized evaluating correctness, comprehensibility, and security using 14 established code quality metrics. The key findings reveal ChatGPT's strengths in crafting concise, efficient code with advanced constructs, showcasing strengths in data analysis tasks (93.1% accuracy) but limitations in visual-graphical challenges. Comparative analysis with human code highlights ChatGPT's inclination towards modular design and superior error handling. Additionally, machine learning models effectively distinguished ChatGPT from human code with up to 88% accuracy, suggesting detectable coding style disparities. By providing profound insights into ChatGPT's code generation capabilities and limitations through quantitative metrics and qualitative analysis, this study makes valuable contributions toward advancing AI-based programming assistants. The curated dataset and methodology offer a robust foundation for future research in this nascent domain. All data and codes are available on https://github.com/DSAatUSU/ChatGPT-promises-and-pitfalls.
Abstract: 本文对ChatGPT这一著名的大语言模型的代码生成能力进行了全面评估,并与人类程序员进行了比较。 整理了一个包含5个类别共131个代码生成提示的新数据集,以实现稳健的分析。 所有提示的代码解决方案均由ChatGPT和人类生成,共计262个代码样本。 采用了一种细致的手动评估方法,优先使用14个已建立的代码质量指标来评估正确性、可理解性和安全性。 主要发现表明,ChatGPT在编写简洁高效的代码方面具有优势,使用了高级结构,在数据分析任务中表现出色(93.1%的准确率),但在视觉图形挑战方面存在局限性。 与人类代码的比较分析表明,ChatGPT倾向于模块化设计并具有更优的错误处理能力。 此外,机器学习模型能够以高达88%的准确率区分ChatGPT代码和人类代码,表明编码风格存在可检测的差异。 通过定量指标和定性分析,本研究深入揭示了ChatGPT的代码生成能力和局限性,为推进基于人工智能的编程助手做出了有价值的贡献。 整理的数据集和方法为该新兴领域未来的研究提供了坚实的基础。 所有数据和代码均可在https://github.com/DSAatUSU/ChatGPT-promises-and-pitfalls上获取。
Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Cite as: arXiv:2311.02640 [cs.SE]
  (or arXiv:2311.02640v1 [cs.SE] for this version)
  https://doi.org/10.48550/arXiv.2311.02640
arXiv-issued DOI via DataCite

Submission history

From: Hamid Karimi [view email]
[v1] Sun, 5 Nov 2023 12:56:40 UTC (11,250 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • TeX Source
license icon view license
Current browse context:
cs.SE
< prev   |   next >
new | recent | 2023-11
Change to browse by:
cs
cs.AI

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号