A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads

Strausz, András; Pardon, Niels; Giurgiu, Ioana

Computer Science > Databases

arXiv:2506.02802 (cs)

[Submitted on 3 Jun 2025 ]

Title: A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads

Title: 基于学习的成本模型的跨引擎SQL工作负载优化器

Authors:András Strausz, Niels Pardon, Ioana Giurgiu

Abstract: Lakehouse systems enable the same data to be queried with multiple execution engines. However, selecting the engine best suited to run a SQL query still requires a priori knowledge of the query computational requirements and an engine capability, a complex and manual task that only becomes more difficult with the emergence of new engines and workloads. In this paper, we address this limitation by proposing a cross-engine optimizer that can automate engine selection for diverse SQL queries through a learned cost model. Optimized with hints, a query plan is used for query cost prediction and routing. Cost prediction is formulated as a multi-task learning problem, and multiple predictor heads, corresponding to different engines and provisionings, are used in the model architecture. This eliminates the need to train engine-specific models and allows the flexible addition of new engines at a minimal fine-tuning cost. Results on various databases and engines show that using a query optimized logical plan for cost estimation decreases the average Q-error by even 12.6% over using unoptimized plans as input. Moreover, the proposed cross-engine optimizer reduces the total workload runtime by up to 25.2% in a zero-shot setting and 30.4% in a few-shot setting when compared to random routing.

Abstract: 湖仓系统能够用多种执行引擎查询相同的数据。然而，选择最适合运行SQL查询的引擎仍然需要提前了解查询的计算需求和引擎的能力，这是一个复杂且手动的任务，并且随着新引擎和工作负载的出现，任务难度不断增加。本文通过提出一个跨引擎优化器来解决这一局限性，该优化器可以通过学习的成本模型实现对多样化SQL查询的引擎自动选择。带有提示的优化查询计划用于查询成本预测和路由。成本预测被形式化为一个多任务学习问题，在模型架构中使用了对应于不同引擎和配置的多个预测头。这消除了训练特定于引擎模型的需求，并允许以最小的微调成本灵活地添加新引擎。各种数据库和引擎上的结果显示，使用经过优化的逻辑查询计划进行成本估算，即使比使用未优化计划作为输入的情况下，平均Q-error也能降低12.6%。此外，与随机路由相比，所提出的跨引擎优化器在零样本设置下将总工作负载运行时间减少了高达25.2%，在少量样本设置下减少了30.4%。

Comments:	6 pages
Subjects:	Databases (cs.DB) ; Machine Learning (cs.LG)
Cite as:	arXiv:2506.02802 [cs.DB]
	(or arXiv:2506.02802v1 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2506.02802

Submission history

From: Ioana Giurgiu [view email]
[v1] Tue, 3 Jun 2025 12:32:56 UTC (1,036 KB)

Computer Science > Databases

Title: A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads

Title: 基于学习的成本模型的跨引擎SQL工作负载优化器

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title: A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads Show Chinese title

Title: 基于学习的成本模型的跨引擎SQL工作负载优化器

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads