Sequential anomaly identification with observation control under generalized error metrics

Tsopelakos, Aristomenis; Fellouris, Georgios

数学 > 统计理论

arXiv:2412.04693 (math)

[提交于 2024年12月6日 ]

标题：广义误差度量下的观测控制序列异常识别

标题： Sequential anomaly identification with observation control under generalized error metrics

Authors:Aristomenis Tsopelakos, Georgios Fellouris

摘要：考虑顺序异常检测与识别问题，其中多个数据源同时被监控，并且目标是在实时情况下识别出那些（如果存在的话）表现出“异常”统计行为的数据源。假定在每次采样时刻可以采样的数据源数量有一个上限，但决策者根据已收集的数据选择采样哪些数据源。因此，在此上下文中，策略不仅包括停止规则和决定规则（确定何时应终止采样以及在停止时识别哪些源为异常），还包括一个采样规则，该规则在每个时间点确定哪些源应被采样，同时受到采样约束的限制。考虑了两种不同的公式化方法，它们需要控制不同的“广义”误差度量。第一种方法容忍特定用户指定数量的错误（无论何种错误），而第二种方法则容忍用户指定的不同数量的误报和漏报。对于每种情况，当误差概率趋于零时，都建立了停止期望时间的通用渐近下界，并证明它可以通过结合在全采样情况下提出的停止和决策规则以及实现每个源特定长期采样频率的概率采样规则的策略来达到。此外，通过仿真研究比较了最优的一阶渐近逼近的停止期望时间和有限范围内的相应因子，并评估了采样约束和对错误的容忍度的影响。

摘要： The problem of sequential anomaly detection and identification is considered, where multiple data sources are simultaneously monitored and the goal is to identify in real time those, if any, that exhibit ``anomalous" statistical behavior. An upper bound is postulated on the number of data sources that can be sampled at each sampling instant, but the decision maker selects which ones to sample based on the already collected data. Thus, in this context, a policy consists not only of a stopping rule and a decision rule that determine when sampling should be terminated and which sources to identify as anomalous upon stopping, but also of a sampling rule that determines which sources to sample at each time instant subject to the sampling constraint. Two distinct formulations are considered, which require control of different, ``generalized" error metrics. The first one tolerates a certain user-specified number of errors, of any kind, whereas the second tolerates distinct, user-specified numbers of false positives and false negatives. For each of them, a universal asymptotic lower bound on the expected time for stopping is established as the error probabilities go to 0, and it is shown to be attained by a policy that combines the stopping and decision rules proposed in the full-sampling case with a probabilistic sampling rule that achieves a specific long-run sampling frequency for each source. Moreover, the optimal to a first order asymptotic approximation expected time for stopping is compared in simulation studies with the corresponding factor in a finite regime, and the impact of the sampling constraint and tolerance to errors is assessed.

评论：	57页，4幅图，1张表格
主题：	统计理论 (math.ST)
引用方式：	arXiv:2412.04693 [math.ST]
	(或者 arXiv:2412.04693v1 [math.ST] 对于此版本)
	https://doi.org/10.48550/arXiv.2412.04693

提交历史

来自： Aristomenis Tsopelakos [查看电子邮件]
[v1] 星期五， 2024 年 12 月 6 日 01:07:49 UTC (143 KB)

数学 > 统计理论

标题：广义误差度量下的观测控制序列异常识别

标题： Sequential anomaly identification with observation control under generalized error metrics

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 统计理论

标题： 广义误差度量下的观测控制序列异常识别 显示英文标题

标题： Sequential anomaly identification with observation control under generalized error metrics

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：广义误差度量下的观测控制序列异常识别