Open-Set Semantic Gaussian Splatting SLAM with Expandable Representation

Overview

Abstract: This work enables everyday devices, e.g., smartphones, to dynamically capture open-ended 3D scenes with rich, expandable semantics for immersive virtual worlds. While 3DGS and foundation models hold promise for semantic scene understanding, existing solutions suffer from unscalable semantic integration, prohibitive memory costs, and cross-view inconsistency. To respond, we propose Open-Set Semantic Gaussian Splatting SLAM, a GS-SLAM system augmented by an expandable semantic feature pool that decouples condensed scene-level semantics from individual 3D Gaussians. Each Gaussian references semantics via a lightweight indexing vector, reducing memory overhead by orders of magnitude while supporting dynamic updates. Besides, we introduce a consistency-aware optimization strategy alongside a Semantic Stability Guidance mechanism to enhance long-term, cross-view semantic consistency and resolve inconsistencies. Experiments demonstrate that our system achieves high-fidelity rendering with scalable, open-set semantics across both controlled and in-the-wild environments, supporting applications like 3D localization and scene editing. These results mark an initial yet solid step towards high-quality, expressive, and accessible 3D virtual world modeling.

Methodology

Framework Overview. We enhance existing 3DGS-based SLAM with an expandable semantic representation, introducing a learnable semantic feature pool that stores condensed scene-level semantics and supports dynamic expansion. Each Gaussian retrieves its semantic feature via soft aggregation from the shared pool through a lightweight key. To improve cross-view and temporal consistency, we further introduce an Intra-Inter Semantic Consistency Objective and a Semantic Stability Guidance mechanism, enabling stable and coherent open-set semantic reconstruction during SLAM.

Tracking Result

Results on Replica

Appearance Mapping Results

Rendering Results on Replica

Reconstruction Results on Replica

Ours

SplaTAM

Ours

SplaTAM

Ours

SplaTAM

Ours

SplaTAM

Rendering Comparisons over SplaTAM on Replica

Ours

Point-SLAM

Ours

Point-SLAM

Rendering Comparisons over Point-SLAM on Replica

Ours

Point-SLAM

Ours

Point-SLAM

Render Comparisons on ScanNet and TUM

Semantic Reconstruction

Comparisons with Closed-Set Semantic SLAM

Comparisons with SfM-based Open-Set Semantic Methods

Rasterized Semantic

Rasterized RGB

Rasterized Semantic

Rasterized RGB

Open-set Semantic Reconstruction Results on Replica

Everyday Devices Results on In-the-Wild Data

Comprehensive Visual Presentation

This is a comprehensive visual presentation of Replica room0. The first row shows the full reconstruction results, while the second row displays a heatmap generated from open-set semantic queries to identify objects in the scene.