Room Scene Discovery and Grouping in Unstructured Vacation Rental Image Collections

Kappagantula, Vignesh Ram Nithin; Hassantabar, Shayan

计算机科学 > 计算机视觉与模式识别

arXiv:2507.00263 (cs)

[提交于 2025年6月30日 ]

标题：房间场景发现与分组在非结构化度假租赁图像集合中

标题： Room Scene Discovery and Grouping in Unstructured Vacation Rental Image Collections

Authors:Vignesh Ram Nithin Kappagantula, Shayan Hassantabar

摘要：随着度假租赁（VR）平台的迅速增长，房产图片的数量不断增加，这些图片通常没有结构化的分类。这种缺乏组织的情况给旅行者理解房产的空间布局带来了重大挑战，尤其是在存在多个相同类型的房间时。为了解决这个问题，我们引入了一种有效的解决方案，用于解决房间场景发现和分组问题，以及识别每个卧室组内的床型。这种分组对于旅行者理解房产的空间组织、布局和睡眠配置是有价值的。我们提出了一个计算效率高的机器学习流程，其特点是低延迟和能够以样本高效学习的方式有效运行，使其非常适合实时和数据稀缺的环境。该流程集成了一个监督的房间类型检测模型、一个监督的重叠检测模型，用于识别两张图像之间的重叠相似性，以及一个聚类算法，使用相似性分数将同一空间的图像分组在一起。此外，该流程基于视觉内容，利用多模态大语言模型（MLLM）模型，将每个卧室组映射到房产元数据中指定的相应床型。我们分别评估了上述模型，并全面评估了整个流程，观察到性能强劲，显著优于对比学习和使用预训练嵌入进行聚类等现有方法。

摘要： The rapid growth of vacation rental (VR) platforms has led to an increasing volume of property images, often uploaded without structured categorization. This lack of organization poses significant challenges for travelers attempting to understand the spatial layout of a property, particularly when multiple rooms of the same type are present. To address this issue, we introduce an effective approach for solving the room scene discovery and grouping problem, as well as identifying bed types within each bedroom group. This grouping is valuable for travelers to comprehend the spatial organization, layout, and the sleeping configuration of the property. We propose a computationally efficient machine learning pipeline characterized by low latency and the ability to perform effectively with sample-efficient learning, making it well-suited for real-time and data-scarce environments. The pipeline integrates a supervised room-type detection model, a supervised overlap detection model to identify the overlap similarity between two images, and a clustering algorithm to group the images of the same space together using the similarity scores. Additionally, the pipeline maps each bedroom group to the corresponding bed types specified in the property's metadata, based on the visual content present in the group's images using a Multi-modal Large Language Model (MLLM) model. We evaluate the aforementioned models individually and also assess the pipeline in its entirety, observing strong performance that significantly outperforms established approaches such as contrastive learning and clustering with pretrained embeddings.

主题：	计算机视觉与模式识别 (cs.CV) ; 机器学习 (cs.LG); 神经与进化计算 (cs.NE)
引用方式：	arXiv:2507.00263 [cs.CV]
	(或者 arXiv:2507.00263v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.00263

提交历史

来自： Vignesh Ram Nithin Kappagantula [查看电子邮件]
[v1] 星期一， 2025 年 6 月 30 日 21:11:35 UTC (7,281 KB)

计算机科学 > 计算机视觉与模式识别

标题：房间场景发现与分组在非结构化度假租赁图像集合中

标题： Room Scene Discovery and Grouping in Unstructured Vacation Rental Image Collections

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 房间场景发现与分组在非结构化度假租赁图像集合中 显示英文标题

标题： Room Scene Discovery and Grouping in Unstructured Vacation Rental Image Collections

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：房间场景发现与分组在非结构化度假租赁图像集合中