Statistical comparison of Hidden Markov Models via Fragment Analysis

Hernandez-Suarez, Carlos M.; Montesinos-López, Osval A.

Statistics > Methodology

arXiv:2504.21046 (stat)

[Submitted on 28 Apr 2025 ]

Title: Statistical comparison of Hidden Markov Models via Fragment Analysis

Title: 隐马尔可夫模型的片段分析统计比较

Authors:Carlos M. Hernandez-Suarez, Osval A. Montesinos-López

Abstract: Standard practice in Hidden Markov Model (HMM) selection favors the candidate with the highest full-sequence likelihood, although this is equivalent to making a decision based on a single realization. We introduce a \emph{fragment-based} framework that redefines model selection as a formal statistical comparison. For an unknown true model $\mathrm{HMM}_0$ and a candidate $\mathrm{HMM}_j$, let $\mu_j(r)$ denote the probability that $\mathrm{HMM}_j$ and $\mathrm{HMM}_0$ generate the same sequence of length~$r$. We show that if $\mathrm{HMM}_i$ is closer to $\mathrm{HMM}_0$ than $\mathrm{HMM}_j$, there exists a threshold $r^{*}$ -- often small -- such that $\mu_i(r)>\mu_j(r)$ for all $r\geq r^{*}$. Sampling $k$ independent fragments yields unbiased estimators $\hat{\mu}_j(r)$ whose differences are asymptotically normal, enabling a straightforward $Z$-test for the hypothesis $H_0\!:\,\mu_i(r)=\mu_j(r)$. By evaluating only short subsequences, the procedure circumvents full-sequence likelihood computation and provides valid $p$-values for model comparison.

Abstract: 在隐马尔可夫模型（HMM）选择中的标准做法是青睐具有最高全序列似然性的候选模型，尽管这等同于基于单一实现做出决策。我们引入了一个 \emph{基于片段的} 框架，重新定义了模型选择为正式的统计比较。对于一个未知的真实模型 $\mathrm{HMM}_0$ 和一个候选模型 $\mathrm{HMM}_j$，设 $\mu_j(r)$ 表示 $\mathrm{HMM}_j$ 和 $\mathrm{HMM}_0$ 生成相同长度为~$r$ 的序列的概率。我们证明，如果$\mathrm{HMM}_i$比$\mathrm{HMM}_j$更接近$\mathrm{HMM}_0$，则存在一个阈值$r^{*}$——通常很小——使得对于所有$r\geq r^{*}$，$\mu_i(r)>\mu_j(r)$。采样$k$个独立片段得到无偏估计量$\hat{\mu}_j(r)$，这些估计量的差异渐近正态，从而可以进行简单的$Z$检验以检验假设$H_0\!:\,\mu_i(r)=\mu_j(r)$。通过仅评估短子序列，该方法规避了全序列似然计算，并提供了有效的模型比较的$p$值。

Comments:	Eight pages, 1 figure
Subjects:	Methodology (stat.ME) ; Statistics Theory (math.ST)
MSC classes:	62M05, 62F03, 62F12
Cite as:	arXiv:2504.21046 [stat.ME]
	(or arXiv:2504.21046v1 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2504.21046

Submission history

From: Carlos Hernandez-Suarez M [view email]
[v1] Mon, 28 Apr 2025 22:08:20 UTC (30 KB)

Statistics > Methodology

Title: Statistical comparison of Hidden Markov Models via Fragment Analysis

Title: 隐马尔可夫模型的片段分析统计比较

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title: Statistical comparison of Hidden Markov Models via Fragment Analysis Show Chinese title

Title: 隐马尔可夫模型的片段分析统计比较

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Statistical comparison of Hidden Markov Models via Fragment Analysis