Semantic Layering in Room Segmentation via LLMs
Taehyeon Kim and Byung-Cheol Min
Purdue University
Abstract
In this research, we introduce Semantic Layering in Room Segmentation via LLMs (SeLRoS), an advanced method for semantic room segmentation by integrating Large Language Models (LLMs) with traditional 2D map-based segmentation. Unlike previous approaches that solely focus on the geometric segmentation of indoor environments, our work enriches segmented maps with semantic data, including object identification and spatial relationships, to enhance robotic navigation. By leveraging LLMs, we provide a novel framework that interprets and organizes complex information about each segmented area, thereby improving the accuracy and contextual relevance of room segmentation. Furthermore, SeLRoS overcomes the limitations of existing algorithms by using a semantic evaluation method to accurately distinguish true room divisions from those erroneously generated by furniture and segmentation inaccuracies. The effectiveness of SeLRoS is verified through its application across 30 different 3D environments.
Framework of SeLRoS
Overview of SeLRoS’s structure: SeLRoS begins with Geometric Room Segmentation, where a 2D map (M) from the Original Environment (E) is transformed into a Segmentation Map (S). Following this, the Object Mapping process extracts Object Information (Os) by analyzing scenes from the Original Environment’s center coordinates of each segmented space (s), employing an Object Detection algorithm. In the Semantic Integration process, harmonizing s, Os and the data of spatial relations (Rs) through the Room Information Interpreter and generating prompts P(s, Os, Rs) via Hierarchical Query. The final outputs are Improved Segmentation Map (S') with Semantic Information (I).
This pseudocode illustrates the comprehensive algorithm of SeLRoS, with each function representing the stages of Geometric Room Segmentation, Object Mapping, and Semantic Integration. As illustrated in the figure above, the pseudocode depicts the workflow of accepting the original environment (E) as input and ultimately generating an improved segmentation map (S') along with semantic information (I).
Semantic Integration - Room Information Interpreter
This pseudocode illustrates the initial component -Room Information Interpreter- of the Semantic Integration phase within SeLRoS.
Room Information Interpreter is designed to process a segmentation map, yielding detailed outputs about each segmented room, including its area, dimensions (length and width), and the spatial relationships with adjacent rooms. A figure below showcases the example of Room Information Interpreter's output.
Semantic Integration - Hierarchical Query
Hierarchical Query is hierarchically composed of Room-Level Query and Environment-Level Query. The red box represents the role component, the yellow box represents the instruction, and blue box signifies the set of Semantic Information.
Experimental Setup and Video
To demonstrate SeLRoS's applicability across diverse settings, we conducted experiments in 30 different environments generated with ProcTHOR.
Although the environments generated through ProcTHOR are unlabeled data, the boundaries of each room tend to be distinguished according to the color of the floors or wall. Therefore, in this study, ground truth was specified using these indicators.
For further insights into our experimental process, videos and additional materials are accessible via the Video and Code buttons located at the website's top.