A Unifying Algorithm for Hierarchical Queries

Khamis, Mahmoud Abo; Comer, Jesse; Kolaitis, Phokion; Roy, Sudeepa; Tannen, Val

Computer Science > Databases

arXiv:2506.10238v2 (cs)

[Submitted on 11 Jun 2025 (v1) , last revised 24 Sep 2025 (this version, v2)]

Title: A Unifying Algorithm for Hierarchical Queries

Title: 一种分层查询的统一算法

Authors:Mahmoud Abo Khamis, Jesse Comer, Phokion Kolaitis, Sudeepa Roy, Val Tannen

Abstract: The class of hierarchical queries is known to define the boundary of the dichotomy between tractability and intractability for the following two extensively studied problems about self-join free Boolean conjunctive queries (SJF-BCQ): (i) evaluating a SJF-BCQ on a tuple-independent probabilistic database; (ii) computing the Shapley value of a fact in a database on which a SJF-BCQ evaluates to true. Here, we establish that hierarchical queries define also the boundary of the dichotomy between tractability and intractability for a different natural algorithmic problem, which we call the "bag-set maximization" problem. The bag-set maximization problem associated with a SJF-BCQ $Q$ asks: given a database $\cal D$, find the biggest value that $Q$ takes under bag semantics on a database $\cal D'$ obtained from $\cal D$ by adding at most $\theta$ facts from another given database $\cal D^r$. For non-hierarchical queries, we show that the bag-set maximization problem is an NP-complete optimization problem. More significantly, for hierarchical queries, we show that all three aforementioned problems (probabilistic query evaluation, Shapley value computation, and bag-set maximization) admit a single unifying polynomial-time algorithm that operates on an abstract algebraic structure, called a "2-monoid". Each of the three problems requires a different instantiation of the 2-monoid tailored for the problem at hand.

Abstract: 层次查询类已知是以下两个广泛研究问题之间的可解性与不可解性二分法的边界：(i) 在一个元组独立的概率数据库上评估一个无自连接的布尔连贯查询（SJF-BCQ）；(ii) 在一个SJF-BCQ为真的数据库上计算一个事实的Shapley值。在这里，我们确立了层次查询也定义了另一种自然算法问题的可解性与不可解性二分法的边界，我们称之为“集合-多重集最大化”问题。与一个SJF-BCQ$Q$相关的集合-多重集最大化问题要求：给定一个数据库$\cal D$，找到在从$\cal D$添加最多$\theta$个来自另一个给定数据库$\cal D^r$的事实后得到的数据库$\cal D'$上，$Q$在多重集语义下取得的最大值。对于非层次查询，我们证明集合-多重集最大化问题是NP完全的优化问题。更重要的是，对于层次查询，我们证明上述三个问题（概率查询评估、Shapley值计算和集合-多重集最大化）都接受一个统一的多项式时间算法，该算法作用于称为“2-独异半群”的抽象代数结构上。这三个问题各自需要针对特定问题定制的2-独异半群实例。

Subjects:	Databases (cs.DB)
Cite as:	arXiv:2506.10238 [cs.DB]
	(or arXiv:2506.10238v2 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2506.10238

Submission history

From: Mahmoud Abo Khamis [view email]
[v1] Wed, 11 Jun 2025 23:43:58 UTC (31 KB)
[v2] Wed, 24 Sep 2025 00:03:49 UTC (31 KB)

Computer Science > Databases

Title: A Unifying Algorithm for Hierarchical Queries

Title: 一种分层查询的统一算法

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title: A Unifying Algorithm for Hierarchical Queries Show Chinese title

Title: 一种分层查询的统一算法

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: A Unifying Algorithm for Hierarchical Queries