ysmao commited on
Commit
dc6d35f
·
verified ·
1 Parent(s): 7d94285

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -12
README.md CHANGED
@@ -29,12 +29,16 @@ base_model:
29
  <div align="center" style="line-height: 1;">
30
  <a href="https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
31
  src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialLM-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
 
 
32
  <a href="https://huggingface.co/datasets/manycore-research/SpatialLM-Testset" target="_blank" style="margin: 2px;"><img alt="Dataset"
33
  src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Testset-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
34
  </div>
35
 
36
  ## ✨ News
37
 
 
 
38
  - [Jun, 2025] Check out our new models: [SpatialLM1.1-Llama-1B](https://huggingface.co/manycore-research/SpatialLM1.1-Llama-1B) and [SpatialLM1.1-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B), now available on Hugging Face. SpatialLM1.1 doubles the point cloud resolution, incorporates a more powerful point cloud encoder [Sonata](https://xywu.me/sonata/) and supports detection with user-specified categories.
39
  - [Jun, 2025] SpatialLM [Technical Report](https://arxiv.org/abs/2506.07491) is now on arXiv.
40
  - [Mar, 2025] We're excited to release the [SpatialLM-Llama-1B](https://huggingface.co/manycore-research/SpatialLM-Llama-1B) and [SpatialLM-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM-Qwen-0.5B) on Hugging Face.
@@ -160,6 +164,20 @@ python eval.py --metadata SpatialLM-Testset/test.csv --gt_dir SpatialLM-Testset/
160
 
161
  We provide an example of how to use our model to estimate scene layout starting from a RGB video with the newly released [SLAM3R](https://github.com/PKU-VCL-3DV/SLAM3R) in [EXAMPLE.md](EXAMPLE.md). These steps work for MASt3R-SLAM, and other reconstruction methods as well.
162
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
  ## SpatialLM Testset
164
 
165
  We provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM). SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos.
@@ -182,8 +200,8 @@ Layout estimation focuses on predicting architectural elements, i.e., walls, doo
182
 
183
  | **Method** | **RoomFormer** | **SceneScript (finetuned)** | **SpatialLM1.1-Qwen-0.5B (finetuned)** |
184
  | :-------------: | :------------: | :-------------------------: | :------------------------------------: |
185
- | **F1 @.25 IoU** | 70.4 | 83.1 | 86.5 |
186
- | **F1 @.5 IoU** | 67.2 | 80.8 | 84.6 |
187
 
188
  </div>
189
 
@@ -210,8 +228,8 @@ Zero-shot detection results on the challenging SpatialLM-Testset are reported in
210
  | :-------------: | :-----------------------: | :------------------------: |
211
  | **Layout** | **F1 @.25 IoU (2D)** | **F1 @.25 IoU (2D)** |
212
  | wall | 68.9 | 68.2 |
213
- | door | 46.3 | 43.1 |
214
- | window | 43.8 | 47.4 |
215
  | | | |
216
  | **Objects** | **F1 @.25 IoU (3D)** | **F1 @.25 IoU (2D)** |
217
  | curtain | 34.9 | 37.0 |
@@ -262,14 +280,11 @@ SpatialLM1.1 are built upon Sonata point cloud encoder, model weight is licensed
262
  If you find this work useful, please consider citing:
263
 
264
  ```bibtex
265
- @article{SpatialLM,
266
- title = {SpatialLM: Training Large Language Models for Structured Indoor Modeling},
267
- author = {Mao, Yongsen and Zhong, Junhao and Fang, Chuan and Zheng, Jia and Tang, Rui and Zhu, Hao and Tan, Ping and Zhou, Zihan},
268
- journal = {arXiv preprint},
269
- year = {2025},
270
- eprint = {2506.07491},
271
- archivePrefix = {arXiv},
272
- primaryClass = {cs.CV}
273
  }
274
  ```
275
 
 
29
  <div align="center" style="line-height: 1;">
30
  <a href="https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
31
  src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialLM-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
32
+ <a href="https://huggingface.co/datasets/manycore-research/SpatialLM-Dataset" target="_blank" style="margin: 2px;"><img alt="Dataset"
33
+ src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Dataset-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
34
  <a href="https://huggingface.co/datasets/manycore-research/SpatialLM-Testset" target="_blank" style="margin: 2px;"><img alt="Dataset"
35
  src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Testset-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
36
  </div>
37
 
38
  ## ✨ News
39
 
40
+ - [Sept, 2025] [SpatialLM-Dataset](https://huggingface.co/datasets/manycore-research/SpatialLM-Dataset) is now available on Hugging Face.
41
+ - [Sept, 2025] SpatialLM accepted at NeurIPS 2025.
42
  - [Jun, 2025] Check out our new models: [SpatialLM1.1-Llama-1B](https://huggingface.co/manycore-research/SpatialLM1.1-Llama-1B) and [SpatialLM1.1-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B), now available on Hugging Face. SpatialLM1.1 doubles the point cloud resolution, incorporates a more powerful point cloud encoder [Sonata](https://xywu.me/sonata/) and supports detection with user-specified categories.
43
  - [Jun, 2025] SpatialLM [Technical Report](https://arxiv.org/abs/2506.07491) is now on arXiv.
44
  - [Mar, 2025] We're excited to release the [SpatialLM-Llama-1B](https://huggingface.co/manycore-research/SpatialLM-Llama-1B) and [SpatialLM-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM-Qwen-0.5B) on Hugging Face.
 
164
 
165
  We provide an example of how to use our model to estimate scene layout starting from a RGB video with the newly released [SLAM3R](https://github.com/PKU-VCL-3DV/SLAM3R) in [EXAMPLE.md](EXAMPLE.md). These steps work for MASt3R-SLAM, and other reconstruction methods as well.
166
 
167
+ ## SpatialLM Dataset
168
+
169
+ The SpatialLM dataset is a large-scale, high-quality synthetic dataset designed by professional 3D designers and used for real-world production. It contains point clouds from 12,328 diverse indoor scenes comprising 54,778 rooms, each paired with rich ground-truth 3D annotations. SpatialLM dataset provides an additional valuable resource for advancing research in indoor scene understanding, 3D perception, and related applications.
170
+
171
+ For access to photorealistic RGB/Depth/Normal/Semantic/Instance panoramic renderings and camera trajectories used to generate the SpatialLM point clouds, please refer to the [SpatialGen project](https://manycore-research.github.io/SpatialGen) for more details.
172
+
173
+ <div align="center">
174
+
175
+ | **Dataset** | **Download** |
176
+ | :---------------: | ---------------------------------------------------------------------------------- |
177
+ | SpatialLM-Dataset | [🤗 Datasets](https://huggingface.co/datasets/manycore-research/SpatialLM-Dataset) |
178
+
179
+ </div>
180
+
181
  ## SpatialLM Testset
182
 
183
  We provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM). SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos.
 
200
 
201
  | **Method** | **RoomFormer** | **SceneScript (finetuned)** | **SpatialLM1.1-Qwen-0.5B (finetuned)** |
202
  | :-------------: | :------------: | :-------------------------: | :------------------------------------: |
203
+ | **F1 @.25 IoU** | 83.4 | 90.4 | 94.3 |
204
+ | **F1 @.5 IoU** | 81.4 | 89.2 | 93.5 |
205
 
206
  </div>
207
 
 
228
  | :-------------: | :-----------------------: | :------------------------: |
229
  | **Layout** | **F1 @.25 IoU (2D)** | **F1 @.25 IoU (2D)** |
230
  | wall | 68.9 | 68.2 |
231
+ | door | 49.1 | 47.4 |
232
+ | window | 47.0 | 51.4 |
233
  | | | |
234
  | **Objects** | **F1 @.25 IoU (3D)** | **F1 @.25 IoU (2D)** |
235
  | curtain | 34.9 | 37.0 |
 
280
  If you find this work useful, please consider citing:
281
 
282
  ```bibtex
283
+ @inproceedings{SpatialLM,
284
+ title = {SpatialLM: Training Large Language Models for Structured Indoor Modeling},
285
+ author = {Mao, Yongsen and Zhong, Junhao and Fang, Chuan and Zheng, Jia and Tang, Rui and Zhu, Hao and Tan, Ping and Zhou, Zihan},
286
+ booktitle = {Advances in Neural Information Processing Systems},
287
+ year = {2025}
 
 
 
288
  }
289
  ```
290