Galaxea Open-World Dataset and G0 Dual-System VLA Model
Paper
•
2509.00576
•
Published
G0-VLA architecture and training pipeline: Stage 1 pre-trains a vision-language model on cross-embodiment data in an autoregressive manner. Stage 2 and post-train share the same model structure, trained on Galaxea open-world data with embodiment-specific views and high-level and subtask instructions, by supervising the Action Transformer’s action reconstruction with a flow- matching loss.
In this repo, you can find:
All the data and code within this repo are under CC BY-NC-SA 4.0. If you use our dataset or models, please cite:
@article{galaxea2025,
title={Galaxea G0: Open-World Dataset and Dual-System VLA Model},
author={Galaxea Team},
journal={arXiv preprint arXiv:2509.00576},
year={2025}
}
Totally Free + Zero Barriers + No Login Required