Update README.md
Browse files
README.md
CHANGED
@@ -29,12 +29,16 @@ base_model:
|
|
29 |
<div align="center" style="line-height: 1;">
|
30 |
<a href="https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
|
31 |
src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialLM-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
|
|
|
|
|
32 |
<a href="https://huggingface.co/datasets/manycore-research/SpatialLM-Testset" target="_blank" style="margin: 2px;"><img alt="Dataset"
|
33 |
src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Testset-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
|
34 |
</div>
|
35 |
|
36 |
## ✨ News
|
37 |
|
|
|
|
|
38 |
- [Jun, 2025] Check out our new models: [SpatialLM1.1-Llama-1B](https://huggingface.co/manycore-research/SpatialLM1.1-Llama-1B) and [SpatialLM1.1-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B), now available on Hugging Face. SpatialLM1.1 doubles the point cloud resolution, incorporates a more powerful point cloud encoder [Sonata](https://xywu.me/sonata/) and supports detection with user-specified categories.
|
39 |
- [Jun, 2025] SpatialLM [Technical Report](https://arxiv.org/abs/2506.07491) is now on arXiv.
|
40 |
- [Mar, 2025] We're excited to release the [SpatialLM-Llama-1B](https://huggingface.co/manycore-research/SpatialLM-Llama-1B) and [SpatialLM-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM-Qwen-0.5B) on Hugging Face.
|
@@ -160,6 +164,20 @@ python eval.py --metadata SpatialLM-Testset/test.csv --gt_dir SpatialLM-Testset/
|
|
160 |
|
161 |
We provide an example of how to use our model to estimate scene layout starting from a RGB video with the newly released [SLAM3R](https://github.com/PKU-VCL-3DV/SLAM3R) in [EXAMPLE.md](EXAMPLE.md). These steps work for MASt3R-SLAM, and other reconstruction methods as well.
|
162 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
163 |
## SpatialLM Testset
|
164 |
|
165 |
We provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM). SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos.
|
@@ -182,8 +200,8 @@ Layout estimation focuses on predicting architectural elements, i.e., walls, doo
|
|
182 |
|
183 |
| **Method** | **RoomFormer** | **SceneScript (finetuned)** | **SpatialLM1.1-Qwen-0.5B (finetuned)** |
|
184 |
| :-------------: | :------------: | :-------------------------: | :------------------------------------: |
|
185 |
-
| **F1 @.25 IoU** |
|
186 |
-
| **F1 @.5 IoU** |
|
187 |
|
188 |
</div>
|
189 |
|
@@ -210,8 +228,8 @@ Zero-shot detection results on the challenging SpatialLM-Testset are reported in
|
|
210 |
| :-------------: | :-----------------------: | :------------------------: |
|
211 |
| **Layout** | **F1 @.25 IoU (2D)** | **F1 @.25 IoU (2D)** |
|
212 |
| wall | 68.9 | 68.2 |
|
213 |
-
| door |
|
214 |
-
| window |
|
215 |
| | | |
|
216 |
| **Objects** | **F1 @.25 IoU (3D)** | **F1 @.25 IoU (2D)** |
|
217 |
| curtain | 34.9 | 37.0 |
|
@@ -262,14 +280,11 @@ SpatialLM1.1 are built upon Sonata point cloud encoder, model weight is licensed
|
|
262 |
If you find this work useful, please consider citing:
|
263 |
|
264 |
```bibtex
|
265 |
-
@
|
266 |
-
|
267 |
-
|
268 |
-
|
269 |
-
|
270 |
-
eprint = {2506.07491},
|
271 |
-
archivePrefix = {arXiv},
|
272 |
-
primaryClass = {cs.CV}
|
273 |
}
|
274 |
```
|
275 |
|
|
|
29 |
<div align="center" style="line-height: 1;">
|
30 |
<a href="https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
|
31 |
src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialLM-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
|
32 |
+
<a href="https://huggingface.co/datasets/manycore-research/SpatialLM-Dataset" target="_blank" style="margin: 2px;"><img alt="Dataset"
|
33 |
+
src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Dataset-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
|
34 |
<a href="https://huggingface.co/datasets/manycore-research/SpatialLM-Testset" target="_blank" style="margin: 2px;"><img alt="Dataset"
|
35 |
src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Testset-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
|
36 |
</div>
|
37 |
|
38 |
## ✨ News
|
39 |
|
40 |
+
- [Sept, 2025] [SpatialLM-Dataset](https://huggingface.co/datasets/manycore-research/SpatialLM-Dataset) is now available on Hugging Face.
|
41 |
+
- [Sept, 2025] SpatialLM accepted at NeurIPS 2025.
|
42 |
- [Jun, 2025] Check out our new models: [SpatialLM1.1-Llama-1B](https://huggingface.co/manycore-research/SpatialLM1.1-Llama-1B) and [SpatialLM1.1-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B), now available on Hugging Face. SpatialLM1.1 doubles the point cloud resolution, incorporates a more powerful point cloud encoder [Sonata](https://xywu.me/sonata/) and supports detection with user-specified categories.
|
43 |
- [Jun, 2025] SpatialLM [Technical Report](https://arxiv.org/abs/2506.07491) is now on arXiv.
|
44 |
- [Mar, 2025] We're excited to release the [SpatialLM-Llama-1B](https://huggingface.co/manycore-research/SpatialLM-Llama-1B) and [SpatialLM-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM-Qwen-0.5B) on Hugging Face.
|
|
|
164 |
|
165 |
We provide an example of how to use our model to estimate scene layout starting from a RGB video with the newly released [SLAM3R](https://github.com/PKU-VCL-3DV/SLAM3R) in [EXAMPLE.md](EXAMPLE.md). These steps work for MASt3R-SLAM, and other reconstruction methods as well.
|
166 |
|
167 |
+
## SpatialLM Dataset
|
168 |
+
|
169 |
+
The SpatialLM dataset is a large-scale, high-quality synthetic dataset designed by professional 3D designers and used for real-world production. It contains point clouds from 12,328 diverse indoor scenes comprising 54,778 rooms, each paired with rich ground-truth 3D annotations. SpatialLM dataset provides an additional valuable resource for advancing research in indoor scene understanding, 3D perception, and related applications.
|
170 |
+
|
171 |
+
For access to photorealistic RGB/Depth/Normal/Semantic/Instance panoramic renderings and camera trajectories used to generate the SpatialLM point clouds, please refer to the [SpatialGen project](https://manycore-research.github.io/SpatialGen) for more details.
|
172 |
+
|
173 |
+
<div align="center">
|
174 |
+
|
175 |
+
| **Dataset** | **Download** |
|
176 |
+
| :---------------: | ---------------------------------------------------------------------------------- |
|
177 |
+
| SpatialLM-Dataset | [🤗 Datasets](https://huggingface.co/datasets/manycore-research/SpatialLM-Dataset) |
|
178 |
+
|
179 |
+
</div>
|
180 |
+
|
181 |
## SpatialLM Testset
|
182 |
|
183 |
We provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM). SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos.
|
|
|
200 |
|
201 |
| **Method** | **RoomFormer** | **SceneScript (finetuned)** | **SpatialLM1.1-Qwen-0.5B (finetuned)** |
|
202 |
| :-------------: | :------------: | :-------------------------: | :------------------------------------: |
|
203 |
+
| **F1 @.25 IoU** | 83.4 | 90.4 | 94.3 |
|
204 |
+
| **F1 @.5 IoU** | 81.4 | 89.2 | 93.5 |
|
205 |
|
206 |
</div>
|
207 |
|
|
|
228 |
| :-------------: | :-----------------------: | :------------------------: |
|
229 |
| **Layout** | **F1 @.25 IoU (2D)** | **F1 @.25 IoU (2D)** |
|
230 |
| wall | 68.9 | 68.2 |
|
231 |
+
| door | 49.1 | 47.4 |
|
232 |
+
| window | 47.0 | 51.4 |
|
233 |
| | | |
|
234 |
| **Objects** | **F1 @.25 IoU (3D)** | **F1 @.25 IoU (2D)** |
|
235 |
| curtain | 34.9 | 37.0 |
|
|
|
280 |
If you find this work useful, please consider citing:
|
281 |
|
282 |
```bibtex
|
283 |
+
@inproceedings{SpatialLM,
|
284 |
+
title = {SpatialLM: Training Large Language Models for Structured Indoor Modeling},
|
285 |
+
author = {Mao, Yongsen and Zhong, Junhao and Fang, Chuan and Zheng, Jia and Tang, Rui and Zhu, Hao and Tan, Ping and Zhou, Zihan},
|
286 |
+
booktitle = {Advances in Neural Information Processing Systems},
|
287 |
+
year = {2025}
|
|
|
|
|
|
|
288 |
}
|
289 |
```
|
290 |
|