RCC-Dual-GAN: An Efficient Approach for Outlier Detection with Few Identified Anomalies

Li, Zhe; Sun, Chunhua; Liu, Chunli; Chen, Xiayu; Wang, Meng; Liu, Yezheng

计算机科学 > 机器学习

arXiv:2003.03609 (cs)

[提交于 2020年3月7日 ]

标题： RCC-Dual-GAN：一种在少量已知异常情况下进行离群点检测的高效方法

标题： RCC-Dual-GAN: An Efficient Approach for Outlier Detection with Few Identified Anomalies

Authors:Zhe Li, Chunhua Sun, Chunli Liu, Xiayu Chen, Meng Wang, Yezheng Liu

摘要：异常检测是数据挖掘中的一个重要任务，许多技术已在各种应用中得到探索。然而，由于默认假设异常值不集中，无监督的异常检测可能无法正确检测出密度较高的群体异常。对于有监督的异常检测来说，虽然通常可以实现较高的检测率和最优参数，但获得足够且正确的标签是一项耗时的任务。为了解决这些问题，我们专注于在仅有少量已知异常的情况下进行半监督异常检测，希望使用有限的标签实现高检测精度。首先，我们提出了一种新的检测模型 Dual-GAN，该模型可以直接利用已知异常中的潜在信息，同时检测离散异常和部分已知的群体异常。接着，考虑到在复杂数据结构中，输出值相似的实例可能并不都相似，我们将 Dual-GAN 中的两个 MO-GAN 组件替换为 RCC 与 M-GAN 的组合（RCC-Dual-GAN）。此外，为了处理纳什均衡的评估和最优模型的选择，创建并引入了两个评估指标到两个模型中，使检测过程更加智能。在基准数据集和两个实际任务上的大量实验表明，我们提出的方案（即 Dual-GAN 和 RCC-Dual-GAN）即使仅使用少量已知异常，也能显著提高异常检测的准确性。此外，与 Dual-GAN 中的两个 MO-GAN 组件相比，结合 RCC 和 M-GAN 的网络结构在各种情况下具有更大的稳定性。

摘要： Outlier detection is an important task in data mining and many technologies have been explored in various applications. However, due to the default assumption that outliers are non-concentrated, unsupervised outlier detection may not correctly detect group anomalies with higher density levels. As for the supervised outlier detection, although high detection rates and optimal parameters can usually be achieved, obtaining sufficient and correct labels is a time-consuming task. To address these issues, we focus on semi-supervised outlier detection with few identified anomalies, in the hope of using limited labels to achieve high detection accuracy. First, we propose a novel detection model Dual-GAN, which can directly utilize the potential information in identified anomalies to detect discrete outliers and partially identified group anomalies simultaneously. And then, considering the instances with similar output values may not all be similar in a complex data structure, we replace the two MO-GAN components in Dual-GAN with the combination of RCC and M-GAN (RCC-Dual-GAN). In addition, to deal with the evaluation of Nash equilibrium and the selection of optimal model, two evaluation indicators are created and introduced into the two models to make the detection process more intelligent. Extensive experiments on both benchmark datasets and two practical tasks demonstrate that our proposed approaches (i.e., Dual-GAN and RCC-Dual-GAN) can significantly improve the accuracy of outlier detection even with only a few identified anomalies. Moreover, compared with the two MO-GAN components in Dual-GAN, the network structure combining RCC and M-GAN has greater stability in various situations.

主题：	机器学习 (cs.LG) ; 机器学习 (stat.ML)
引用方式：	arXiv:2003.03609 [cs.LG]
	(或者 arXiv:2003.03609v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2003.03609

提交历史

来自： Zhe Li [查看电子邮件]
[v1] 星期六， 2020 年 3 月 7 日 17:13:52 UTC (3,196 KB)

计算机科学 > 机器学习

标题： RCC-Dual-GAN：一种在少量已知异常情况下进行离群点检测的高效方法

标题： RCC-Dual-GAN: An Efficient Approach for Outlier Detection with Few Identified Anomalies

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： RCC-Dual-GAN：一种在少量已知异常情况下进行离群点检测的高效方法 显示英文标题

标题： RCC-Dual-GAN: An Efficient Approach for Outlier Detection with Few Identified Anomalies

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： RCC-Dual-GAN：一种在少量已知异常情况下进行离群点检测的高效方法