N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion

Chin, Caleb; Khubchandani, Aashish; Maskara, Harshvardhan; Choi, Kyuseong; Feitelberg, Jacob; Gong, Albert; Paul, Manit; Sadhukhan, Tathagata; Agarwal, Anish; Dwivedi, Raaz

计算机科学 > 机器学习

arXiv:2506.04166 (cs)

[提交于 2025年6月4日 ]

标题： N$^2$: 基于最近邻矩阵补全的统一Python包与测试平台

标题： N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion

Authors:Caleb Chin, Aashish Khubchandani, Harshvardhan Maskara, Kyuseong Choi, Jacob Feitelberg, Albert Gong, Manit Paul, Tathagata Sadhukhan, Anish Agarwal, Raaz Dwivedi

摘要：最近邻 (NN) 方法作为矩阵补全的有力工具重新受到关注，具有强大的经验表现和近期的理论保证，包括逐元素误差界限、置信区间和渐进最优性。尽管方法简单，近期的研究表明，NN 方法对于各种缺失模式具有鲁棒性，并且在多样化应用中表现出色。本文介绍了一个统一的 Python 包和测试平台 N$^2$，它通过模块化、可扩展的接口整合了一大类基于最近邻的方法。 N$^2$旨在服务于研究人员和实践者，支持快速实验和基准测试。借助这一框架，我们提出了一种新的最近邻变体，在多个设置中实现了最先进的结果。我们还发布了一组真实世界的数据集基准测试套件，涵盖医疗保健、推荐系统、因果推理和大语言模型评估等领域，旨在对矩阵补全方法进行超越合成场景的压力测试。我们的实验表明，虽然经典方法在理想数据上表现出色，但在现实环境中，基于最近邻的技术始终优于它们。

摘要： Nearest neighbor (NN) methods have re-emerged as competitive tools for matrix completion, offering strong empirical performance and recent theoretical guarantees, including entry-wise error bounds, confidence intervals, and minimax optimality. Despite their simplicity, recent work has shown that NN approaches are robust to a range of missingness patterns and effective across diverse applications. This paper introduces N$^2$, a unified Python package and testbed that consolidates a broad class of NN-based methods through a modular, extensible interface. Built for both researchers and practitioners, N$^2$ supports rapid experimentation and benchmarking. Using this framework, we introduce a new NN variant that achieves state-of-the-art results in several settings. We also release a benchmark suite of real-world datasets, from healthcare and recommender systems to causal inference and LLM evaluation, designed to stress-test matrix completion methods beyond synthetic scenarios. Our experiments demonstrate that while classical methods excel on idealized data, NN-based techniques consistently outperform them in real-world settings.

评论：	21页，6个图
主题：	机器学习 (cs.LG) ; 计算 (stat.CO); 机器学习 (stat.ML)
引用方式：	arXiv:2506.04166 [cs.LG]
	(或者 arXiv:2506.04166v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.04166

提交历史

来自： Kyuseong Choi [查看电子邮件]
[v1] 星期三， 2025 年 6 月 4 日 17:04:34 UTC (1,172 KB)

计算机科学 > 机器学习

标题： N$^2$: 基于最近邻矩阵补全的统一Python包与测试平台

标题： N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： N$^2$: 基于最近邻矩阵补全的统一Python包与测试平台 显示英文标题

标题： N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： N$^2$: 基于最近邻矩阵补全的统一Python包与测试平台