Machine Learning based Enterprise Financial Audit Framework and High Risk Identification

Yuan, Tingyu; Zhang, Xi; Chen, Xuanjing

定量金融 > 风险管理

arXiv:2507.06266 (q-fin)

[提交于 2025年7月8日 ]

标题：基于机器学习的企业财务审计框架和高风险识别

标题： Machine Learning based Enterprise Financial Audit Framework and High Risk Identification

Authors:Tingyu Yuan, Xi Zhang, Xuanjing Chen

摘要：面对全球经济的不确定性，财务审计对于监管合规和风险缓解变得至关重要。传统的手动审计方法在面对大量数据量、复杂的业务结构和不断演变的欺诈手段时，日益显得局限。本研究提出了一种基于人工智能的企业财务审计和高风险识别框架，利用机器学习提高效率和准确性。使用来自四大会计事务所（埃森哲、普华永道、德勤、毕马威）2020年至2025年的数据集，研究考察了风险评估、合规违规和欺诈检测的趋势。该数据集包括审计项目数量、高风险案件、欺诈案例、合规违规、员工工作量和客户满意度等关键指标，涵盖了审计行为以及人工智能对运营的影响。为了构建一个稳健的风险预测模型，评估了三种算法——支持向量机（SVM）、随机森林（RF）和K最近邻（KNN）。 SVM通过超平面优化实现复杂分类，RF通过组合决策树来处理高维非线性数据并具有抗过拟合能力，KNN通过基于距离的学习实现灵活的性能。通过分层K折交叉验证和使用F1分数、准确率和召回率进行评估，随机森林表现最佳，F1分数为0.9012，在识别欺诈和合规异常方面表现出色。特征重要性分析显示，审计频率、以往违规、员工工作量和客户评价是关键预测因素。该研究建议采用随机森林作为核心模型，通过特征工程增强特征，并实施实时风险监控。本研究为在现代企业中利用机器学习进行智能审计和风险管理提供了有价值的见解。

摘要： In the face of global economic uncertainty, financial auditing has become essential for regulatory compliance and risk mitigation. Traditional manual auditing methods are increasingly limited by large data volumes, complex business structures, and evolving fraud tactics. This study proposes an AI-driven framework for enterprise financial audits and high-risk identification, leveraging machine learning to improve efficiency and accuracy. Using a dataset from the Big Four accounting firms (EY, PwC, Deloitte, KPMG) from 2020 to 2025, the research examines trends in risk assessment, compliance violations, and fraud detection. The dataset includes key indicators such as audit project counts, high-risk cases, fraud instances, compliance breaches, employee workload, and client satisfaction, capturing both audit behaviors and AI's impact on operations. To build a robust risk prediction model, three algorithms - Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN) - are evaluated. SVM uses hyperplane optimization for complex classification, RF combines decision trees to manage high-dimensional, nonlinear data with resistance to overfitting, and KNN applies distance-based learning for flexible performance. Through hierarchical K-fold cross-validation and evaluation using F1-score, accuracy, and recall, Random Forest achieves the best performance, with an F1-score of 0.9012, excelling in identifying fraud and compliance anomalies. Feature importance analysis reveals audit frequency, past violations, employee workload, and client ratings as key predictors. The study recommends adopting Random Forest as a core model, enhancing features via engineering, and implementing real-time risk monitoring. This research contributes valuable insights into using machine learning for intelligent auditing and risk management in modern enterprises.

主题：	风险管理 (q-fin.RM) ; 人工智能 (cs.AI); 机器学习 (cs.LG); 应用 (stat.AP)
引用方式：	arXiv:2507.06266 [q-fin.RM]
	(或者 arXiv:2507.06266v1 [q-fin.RM] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.06266

提交历史

来自： Xuanjing Chen [查看电子邮件]
[v1] 星期二， 2025 年 7 月 8 日 00:22:49 UTC (619 KB)

定量金融 > 风险管理

标题：基于机器学习的企业财务审计框架和高风险识别

标题： Machine Learning based Enterprise Financial Audit Framework and High Risk Identification

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

定量金融 > 风险管理

标题： 基于机器学习的企业财务审计框架和高风险识别 显示英文标题

标题： Machine Learning based Enterprise Financial Audit Framework and High Risk Identification

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于机器学习的企业财务审计框架和高风险识别