Semantic Layering in Room Segmentation via LLMs 


Taehyeon Kim and Byung-Cheol Min

Purdue University

Abstract

In this research, we introduce Semantic Layering in Room Segmentation via LLMs (SeLRoS), an advanced method for semantic room segmentation by integrating Large Language Models (LLMs) with traditional 2D map-based segmentation. Unlike previous approaches that solely focus on the geometric segmentation of indoor environments, our work enriches segmented maps with semantic data, including object identification and spatial relationships, to enhance robotic navigation. By leveraging LLMs, we provide a novel framework that interprets and organizes complex information about each segmented area, thereby improving the accuracy and contextual relevance of room segmentation. Furthermore, SeLRoS overcomes the limitations of existing algorithms  by using a semantic evaluation method to accurately distinguish true room divisions from those erroneously generated by furniture and segmentation inaccuracies. The effectiveness of SeLRoS is verified through its application across 30 different 3D environments.

Framework of SeLRoS

Overview of SeLRoS’s structure: SeLRoS begins with Geometric Room Segmentation, where a 2D map (M) from the Original Environment (E) is transformed into a Segmentation Map (S). Following this, the Object Mapping process extracts Object Information (Os) by analyzing scenes from the Original Environment’s center coordinates of each segmented space (s), employing an Object Detection algorithm. In the Semantic Integration process, harmonizing s, Os and the data of spatial relations (Rs) through the Room Information Interpreter and generating prompts P(s, Os, Rs) via Hierarchical Query. The final outputs are Improved Segmentation Map (S') with Semantic Information (I).

This pseudocode illustrates the comprehensive algorithm of SeLRoS, with each function representing the stages of Geometric Room Segmentation, Object Mapping, and Semantic Integration. As illustrated in the figure above, the pseudocode depicts the workflow of accepting the original environment (E) as input and ultimately generating an improved segmentation map (S') along with semantic information (I).

Semantic Integration - Room Information Interpreter

This pseudocode illustrates the initial component -Room Information Interpreter- of the Semantic Integration phase within SeLRoS. 

Room Information Interpreter is designed to process a segmentation map, yielding detailed outputs about each segmented room, including its area, dimensions (length and width), and the spatial relationships with adjacent rooms. A figure below showcases the example of Room Information Interpreter's output.

Entire Map's size is (14.1, 10.1)Room 001 - Area: 9.53 % in entire map, Approximate length and width: (5.44, 1.78), Adjacent to: Room 007Room 002 - Area: 3.98 % in entire map, Approximate length and width: (2.38, 1.28), Adjacent to: Room 006Room 003 - Area: 14.36 % in entire map, Approximate length and width: (5.97, 2.58), Adjacent to: Room 006, Room 007Room 004 - Area: 4.24 % in entire map, Approximate length and width: (2.72, 1.08), Adjacent to: Room 006Room 005 - Area: 23.3 % in entire map, Approximate length and width: (5.98, 4.22), Adjacent to: Room 006, Room 012Room 006 - Area: 21.27 % in entire map, Approximate length and width: (5.34, 3.88), Adjacent to: Room 002, Room 003, Room 004, Room 005, Room 009, Room 012Room 007 - Area: 5.07 % in entire map, Approximate length and width: (2.44, 1.5), Adjacent to: Room 001, Room 003, Room 008Room 008 - Area: 8.91 % in entire map, Approximate length and width: (3.93, 2.09), Adjacent to: Room 007, Room 010, Room 011Room 009 - Area: 1.19 % in entire map, Approximate length and width: (1.28, 0.8), Adjacent to: Room 006Room 010 - Area: 5.76 % in entire map, Approximate length and width: (3.2, 1.68), Adjacent to: Room 008Room 011 - Area: 1.01 % in entire map, Approximate length and width: (1.71, 0.36), Adjacent to: Room 008Room 012 - Area: 1.38 % in entire map, Approximate length and width: (1.68, 0.64), Adjacent to: Room 005, Room 006

Semantic Integration - Hierarchical Query

Hierarchical Query is hierarchically composed of Room-Level Query and Environment-Level Query. The red box represents the role component, the yellow box represents the instruction, and blue box signifies the set of Semantic Information.

Experimental Setup and Video

To demonstrate SeLRoS's applicability across diverse settings, we conducted experiments in 30 different environments generated with ProcTHOR.

Although the environments generated through ProcTHOR are unlabeled data, the boundaries of each room tend to be distinguished according to the color of the floors or wall. Therefore, in this study, ground truth was specified using these indicators.

For further insights into our experimental process, videos and additional materials are accessible via the Video and Code buttons located at the website's top.