Open-Set Semantic Gaussian Splatting SLAM with Expandable Representation

NeurIPS 2025 Submission

Anonymous Authors

Overview

Abstract: This work enables everyday devices, e.g., smartphones, to dynamically capture open-ended 3D scenes with rich, expandable semantics for immersive virtual worlds. While 3DGS and foundation models hold promise for semantic scene understanding, existing solutions suffer from unscalable semantic integration, prohibitive memory costs, and cross-view inconsistency. To respond, we propose Open-Set Semantic Gaussian Splatting SLAM, a GS-SLAM system augmented by an expandable semantic feature pool that decouples condensed scene-level semantics from individual 3D Gaussians. Each Gaussian references semantics via a lightweight indexing vector, reducing memory overhead by orders of magnitude while supporting dynamic updates. Besides, we introduce a consistency-aware optimization strategy alongside a Semantic Stability Guidance mechanism to enhance long-term, cross-view semantic consistency and resolve ambiguities. Experiments demonstrate that our system achieves high-fidelity rendering with scalable, open-set semantics across both controlled and in-the-wild envi-ronments, supporting applications like 3D localization and scene editing. These results mark an initial yet solid step towards high-quality, expressive, and accessible 3D virtual world modeling.

Methodology

Framework Overview. We enhance existing 3DGS-based SLAM with an expandable semantic representation, using an expandable semantic feature pool for growing semantics and assigning each Gaussianan aggregated semantic from the pool. Moreover, to improve consistency and reduce ambiguity, we introduce Intra-Inter Semantic Consistency Objectives and Semantic Stability Guidance.

Tracking Result

Results on Replica

Appearance Mapping Results

Rendering Results on Replica

Reconstruction Results on Replica

Ours
SplaTAM
Ours
SplaTAM
Ours
SplaTAM
Ours
SplaTAM

Rendering Comparisons over SplaTAM on Replica

Ours
Point-SLAM
Ours
Point-SLAM

Rendering Comparisons over Point-SLAM on Replica

Ours
Point-SLAM
Ours
Point-SLAM

Render Comparisons on ScanNet and TUM

Semantic Reconstruction

Comparisons with Closed-Set Semantic SLAM

Comparisons with SfM-based Open-Set Semantic Methods

Rasterized Semantic
Rasterized RGB
Rasterized Semantic
Rasterized RGB

Open-set Semantic Reconstruction Results on Replica

Everyday Devices Results on In-the-Wild Data


Comprehensive Visual Presentation

This is a comprehensive visual presentation of Replica room0. The first row shows the full reconstruction results, while the second row displays a heatmap generated from open-set semantic queries to identify objects in the scene.


3D Editing

"Turn cabinet red."

Result
Origin

"Make flower burn."

Result
Origin

Editing

Result
Origin
Result
Origin

Movement & Translation