Humans, Machine Learning, and Language Models in Union: A Cognitive Study on Table Unionability

Marimuthu, Sreeram; Klimenkova, Nina; Shraga, Roee

doi:10.1145/3736733.3736740

Computer Science > Databases

arXiv:2506.12990 (cs)

[Submitted on 15 Jun 2025 ]

Title: Humans, Machine Learning, and Language Models in Union: A Cognitive Study on Table Unionability

Title: 人类、机器学习和语言模型的结合：表格联合性的认知研究

Authors:Sreeram Marimuthu, Nina Klimenkova, Roee Shraga

Abstract: Data discovery and table unionability in particular became key tasks in modern Data Science. However, the human perspective for these tasks is still under-explored. Thus, this research investigates the human behavior in determining table unionability within data discovery. We have designed an experimental survey and conducted a comprehensive analysis, in which we assess human decision-making for table unionability. We use the observations from the analysis to develop a machine learning framework to boost the (raw) performance of humans. Furthermore, we perform a preliminary study on how LLM performance is compared to humans indicating that it is typically better to consider a combination of both. We believe that this work lays the foundations for developing future Human-in-the-Loop systems for efficient data discovery.

Abstract: 数据发现以及特别是表的可联合性成为了现代数据科学中的关键任务。然而，这些任务中的人类视角仍然探索不足。因此，本研究调查了在数据发现中确定表可联合性的人类行为。我们设计了一项实验性调查，并进行了全面分析，评估了人类对表可联合性的决策过程。我们利用分析中的观察结果来开发一个机器学习框架，以提升（原始）人类的表现。此外，我们还进行了初步研究，比较了大型语言模型（LLM）与人类的表现，表明通常最好结合两者。我们认为，这项工作为开发未来高效的数据发现中的“人在回路”系统奠定了基础。

Comments:	6 Pages, 4 figures, ACM SIGMOD HILDA '25 (Status-Accepted)
Subjects:	Databases (cs.DB) ; Machine Learning (cs.LG)
Cite as:	arXiv:2506.12990 [cs.DB]
	(or arXiv:2506.12990v1 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2506.12990
Related DOI:	https://doi.org/10.1145/3736733.3736740

Submission history

From: Sreeram Marimuthu [view email]
[v1] Sun, 15 Jun 2025 23:13:20 UTC (674 KB)

Computer Science > Databases

Title: Humans, Machine Learning, and Language Models in Union: A Cognitive Study on Table Unionability

Title: 人类、机器学习和语言模型的结合：表格联合性的认知研究

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title: Humans, Machine Learning, and Language Models in Union: A Cognitive Study on Table Unionability Show Chinese title

Title: 人类、机器学习和语言模型的结合：表格联合性的认知研究

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Humans, Machine Learning, and Language Models in Union: A Cognitive Study on Table Unionability