Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2506.02802

Help | Advanced Search

Computer Science > Databases

arXiv:2506.02802 (cs)
[Submitted on 3 Jun 2025 ]

Title: A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads

Title: 基于学习的成本模型的跨引擎SQL工作负载优化器

Authors:András Strausz, Niels Pardon, Ioana Giurgiu
Abstract: Lakehouse systems enable the same data to be queried with multiple execution engines. However, selecting the engine best suited to run a SQL query still requires a priori knowledge of the query computational requirements and an engine capability, a complex and manual task that only becomes more difficult with the emergence of new engines and workloads. In this paper, we address this limitation by proposing a cross-engine optimizer that can automate engine selection for diverse SQL queries through a learned cost model. Optimized with hints, a query plan is used for query cost prediction and routing. Cost prediction is formulated as a multi-task learning problem, and multiple predictor heads, corresponding to different engines and provisionings, are used in the model architecture. This eliminates the need to train engine-specific models and allows the flexible addition of new engines at a minimal fine-tuning cost. Results on various databases and engines show that using a query optimized logical plan for cost estimation decreases the average Q-error by even 12.6% over using unoptimized plans as input. Moreover, the proposed cross-engine optimizer reduces the total workload runtime by up to 25.2% in a zero-shot setting and 30.4% in a few-shot setting when compared to random routing.
Abstract: 湖仓系统能够用多种执行引擎查询相同的数据。然而,选择最适合运行SQL查询的引擎仍然需要提前了解查询的计算需求和引擎的能力,这是一个复杂且手动的任务,并且随着新引擎和工作负载的出现,任务难度不断增加。本文通过提出一个跨引擎优化器来解决这一局限性,该优化器可以通过学习的成本模型实现对多样化SQL查询的引擎自动选择。带有提示的优化查询计划用于查询成本预测和路由。成本预测被形式化为一个多任务学习问题,在模型架构中使用了对应于不同引擎和配置的多个预测头。这消除了训练特定于引擎模型的需求,并允许以最小的微调成本灵活地添加新引擎。各种数据库和引擎上的结果显示,使用经过优化的逻辑查询计划进行成本估算,即使比使用未优化计划作为输入的情况下,平均Q-error也能降低12.6%。此外,与随机路由相比,所提出的跨引擎优化器在零样本设置下将总工作负载运行时间减少了高达25.2%,在少量样本设置下减少了30.4%。
Comments: 6 pages
Subjects: Databases (cs.DB) ; Machine Learning (cs.LG)
Cite as: arXiv:2506.02802 [cs.DB]
  (or arXiv:2506.02802v1 [cs.DB] for this version)
  https://doi.org/10.48550/arXiv.2506.02802
arXiv-issued DOI via DataCite

Submission history

From: Ioana Giurgiu [view email]
[v1] Tue, 3 Jun 2025 12:32:56 UTC (1,036 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • Other Formats
license icon view license
Current browse context:
cs.DB
< prev   |   next >
new | recent | 2025-06
Change to browse by:
cs
cs.LG

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号