SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning

Zerkouk, Meriem; Mihoubi, Miloud; Chikhaoui, Belkacem

计算机科学 > 人工智能

arXiv:2507.10421v1 (cs)

[提交于 2025年7月14日 ]

标题： SentiDrop：一种用于预测远程学习退学的多模态机器学习模型

标题： SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning

Authors:Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui

摘要：学校辍学是远程学习中的一个严重问题，早期检测对于有效的干预和学生的坚持至关重要。使用可用的教育数据预测学生辍学是学习分析领域广泛研究的主题。我们的合作伙伴的远程学习平台强调了整合多种数据源的重要性，包括社会人口统计数据、行为数据和情感分析，以准确预测辍学风险。在本文中，我们介绍了一种新的模型，该模型结合了使用双向编码器表示的变压器（BERT）模型对学生评论的情感分析，以及通过极端梯度提升（XGBoost）分析的社会人口统计数据和行为数据。我们在学生评论上微调了 BERT 以捕捉细微的情感，然后将这些情感与通过 XGBoost 中特征重要性技术选择的关键特征合并。我们的模型在下一年度的未见过的数据上进行了测试，达到了 84% 的准确率，相比之下基线模型为 82%。此外，该模型在其他指标如精确率和 F1 分数方面表现出更优的性能。所提出的方法可能是在制定个性化策略以降低辍学率和鼓励学生坚持方面的关键工具。

摘要： School dropout is a serious problem in distance learning, where early detection is crucial for effective intervention and student perseverance. Predicting student dropout using available educational data is a widely researched topic in learning analytics. Our partner's distance learning platform highlights the importance of integrating diverse data sources, including socio-demographic data, behavioral data, and sentiment analysis, to accurately predict dropout risks. In this paper, we introduce a novel model that combines sentiment analysis of student comments using the Bidirectional Encoder Representations from Transformers (BERT) model with socio-demographic and behavioral data analyzed through Extreme Gradient Boosting (XGBoost). We fine-tuned BERT on student comments to capture nuanced sentiments, which were then merged with key features selected using feature importance techniques in XGBoost. Our model was tested on unseen data from the next academic year, achieving an accuracy of 84\%, compared to 82\% for the baseline model. Additionally, the model demonstrated superior performance in other metrics, such as precision and F1-score. The proposed method could be a vital tool in developing personalized strategies to reduce dropout rates and encourage student perseverance

评论：	国际教育与新技术会议（2025）
主题：	人工智能 (cs.AI) ; 新兴技术 (cs.ET); 信息检索 (cs.IR); 机器学习 (cs.LG)
引用方式：	arXiv:2507.10421 [cs.AI]
	(或者 arXiv:2507.10421v1 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.10421

提交历史

来自： Milous Mihoubi [查看电子邮件]
[v1] 星期一， 2025 年 7 月 14 日 16:04:34 UTC (484 KB)

计算机科学 > 人工智能

标题： SentiDrop：一种用于预测远程学习退学的多模态机器学习模型

标题： SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： SentiDrop：一种用于预测远程学习退学的多模态机器学习模型 显示英文标题

标题： SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： SentiDrop：一种用于预测远程学习退学的多模态机器学习模型