Typed Topological Structures Of Datasets

Hu, Wanjun

计算机科学 > 机器学习

arXiv:2508.14008v1 (cs)

[提交于 2025年8月19日 ]

标题：数据集的类型化拓扑结构

标题： Typed Topological Structures Of Datasets

Authors:Wanjun Hu

摘要：一个数据集$X$在$R^2$上是一个有限拓扑空间。当前对数据集的研究集中在统计方法和代数拓扑方法\cite{carlsson}上。在\cite{hu}中，引入了类型化拓扑空间的概念，并表明其在研究有限拓扑空间（如数据集）方面具有潜力。这是一种从一般拓扑学角度出发的新方法。类型化拓扑空间是其开集被分配了类型的拓扑空间。拓扑概念和方法可以使用特定类型的开集进行重新定义。在本文中，我们开发了一组特定的类型及其在数据集$X$上的相关类型化拓扑。利用它，我们可以研究$X$的内部结构。特别是，$R^2$有一个自然的商空间，在其中$X$被组织成轨道，每个轨道被分成组件。这些组件是按顺序排列的。此外，它们可以用一个整数序列表示。跨越轨道的组件形成分支，这种关系可以用一种伪树（称为类型-II伪树）很好地表示。这样的结构为解决计算凸包、孔洞、聚类和异常检测等问题提供了平台。

摘要： A datatset $X$ on $R^2$ is a finite topological space. Current research of a dataset focuses on statistical methods and the algebraic topological method \cite{carlsson}. In \cite{hu}, the concept of typed topological space was introduced and showed to have the potential for studying finite topological spaces, such as a dataset. It is a new method from the general topology perspective. A typed topological space is a topological space whose open sets are assigned types. Topological concepts and methods can be redefined using open sets of certain types. In this article, we develop a special set of types and its related typed topology on a dataset $X$. Using it, we can investigate the inner structure of $X$. In particular, $R^2$ has a natural quotient space, in which $X$ is organized into tracks, and each track is split into components. Those components are in a order. Further, they can be represented by an integer sequence. Components crossing tracks form branches, and the relationship can be well represented by a type of pseudotree (called typed-II pseudotree). Such structures provide a platform for new algorithms for problems such as calculating convex hull, holes, clustering and anomaly detection.

评论：	14页 5图
主题：	机器学习 (cs.LG) ; 一般拓扑 (math.GN)
ACM 类：	G.0; F.m
引用方式：	arXiv:2508.14008 [cs.LG]
	(或者 arXiv:2508.14008v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.14008

提交历史

来自： Wanjun Hu [查看电子邮件]
[v1] 星期二， 2025 年 8 月 19 日 17:14:13 UTC (344 KB)

计算机科学 > 机器学习

标题：数据集的类型化拓扑结构

标题： Typed Topological Structures Of Datasets

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 数据集的类型化拓扑结构 显示英文标题

标题： Typed Topological Structures Of Datasets

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：数据集的类型化拓扑结构