Improving Real-Time Concept Drift Detection using a Hybrid Transformer-Autoencoder Framework

Harshit, N; Mounvik, K

计算机科学 > 机器学习

arXiv:2508.07085 (cs)

[提交于 2025年8月9日 ]

标题：使用混合Transformer-自编码器框架改进实时概念漂移检测

标题： Improving Real-Time Concept Drift Detection using a Hybrid Transformer-Autoencoder Framework

Authors:N Harshit, K Mounvik

摘要：在应用机器学习中，概念漂移（即数据分布的渐进或突然变化）可能会显著降低模型性能。典型的检测方法，如统计检验或基于重构的模型，通常是被动的，并且对早期检测不够敏感。我们的研究提出了一种由Transformer和自编码器组成的混合框架，以建模复杂的时序动态并提供在线漂移检测。我们创建了一种独特的信任得分方法，该方法包括（1）统计和基于重构的漂移指标，更具体地说，是PSI、JSD、Transformer-AE误差，（2）预测不确定性，（3）规则违反情况，以及（4）与信任得分定义的组合指标相一致的分类器误差趋势。使用带有合成漂移的时间序列航空乘客数据集，我们的模型在整体上以及在不同检测阈值下，相比基线方法，在灵敏度和可解释性方面都能更好地检测漂移，并为实际机器学习中的实时漂移检测提供了一个强大的流程。我们使用了一个时间序列的航空乘客数据集，其中逐步注入了期望的漂移刺激，例如后期批次中排列的机票价格，分为10个时间段[1]。在数据中，我们的结果支持Transformer-自编码器比文献中常用的自编码器更早且更敏感地检测漂移，并在更高的错误率和逻辑违规情况下提供了改进的建模。因此，开发了一个稳健的框架，以可靠地监控概念漂移。

摘要： In applied machine learning, concept drift, which is either gradual or abrupt changes in data distribution, can significantly reduce model performance. Typical detection methods,such as statistical tests or reconstruction-based models,are generally reactive and not very sensitive to early detection. Our study proposes a hybrid framework consisting of Transformers and Autoencoders to model complex temporal dynamics and provide online drift detection. We create a distinct Trust Score methodology, which includes signals on (1) statistical and reconstruction-based drift metrics, more specifically, PSI, JSD, Transformer-AE error, (2) prediction uncertainty, (3) rules violations, and (4) trend of classifier error aligned with the combined metrics defined by the Trust Score. Using a time sequenced airline passenger data set with synthetic drift, our proposed model allows for a better detection of drift using as a whole and at different detection thresholds for both sensitivity and interpretability compared to baseline methods and provides a strong pipeline for drift detection in real time for applied machine learning. We evaluated performance using a time-sequenced airline passenger dataset having the gradually injected stimulus of drift in expectations,e.g. permuted ticket prices in later batches, broken into 10 time segments [1].In the data, our results support that the Transformation-Autoencoder detected drift earlier and with more sensitivity than the autoencoders commonly used in the literature, and provided improved modeling over more error rates and logical violations. Therefore, a robust framework was developed to reliably monitor concept drift.

主题：	机器学习 (cs.LG)
引用方式：	arXiv:2508.07085 [cs.LG]
	(或者 arXiv:2508.07085v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.07085

提交历史

来自： Mounvik K [查看电子邮件]
[v1] 星期六， 2025 年 8 月 9 日 19:39:33 UTC (175 KB)

计算机科学 > 机器学习

标题：使用混合Transformer-自编码器框架改进实时概念漂移检测

标题： Improving Real-Time Concept Drift Detection using a Hybrid Transformer-Autoencoder Framework

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 使用混合Transformer-自编码器框架改进实时概念漂移检测 显示英文标题

标题： Improving Real-Time Concept Drift Detection using a Hybrid Transformer-Autoencoder Framework

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：使用混合Transformer-自编码器框架改进实时概念漂移检测