|
--- |
|
license: mit |
|
language: |
|
- en |
|
--- |
|
# RDT-1B |
|
|
|
RDT-1B is a 1B-parameter imitation learning Diffusion Transformer pre-trained on 1M+ multi-robot episodes. Given a language instruction and 3-view RGB image observations, RDT can predict the next |
|
64 robot actions. RDT is inherently compatible with almost all kinds of modern mobile manipulators, from single-arm to dual-arm, joint to EEF, pos. to vel., and even with a mobile chassis. |
|
|
|
All the [code]() and pretrained model weights are licensed under MIT license. |
|
|
|
Please refer to our [project page](https://rdt-robotics.github.io/rdt-robotics/) and [paper]() for more information. |
|
|
|
## Model Details |
|
|
|
- **Developed by** RDT Team from Tsinghua University. |
|
- **License:** MIT |
|
- **Language(s) (NLP):** en |
|
- **Model Architecture:** Diffusion Transformer. |
|
- **Pretrain dataset:** Curated pretrain dataset collected from 46 datasets. Please see [here]() for detail. |
|
- **Repository:** [repo_url] |
|
- **Paper :** [paper_url] |
|
- **Project Page:** https://rdt-robotics.github.io/rdt-robotics/ |
|
|
|
## Uses |
|
|
|
RDT takes language instruction, image observations and proprioception as input, and predicts the next 64 robot actions in the form of unified action space vector, |
|
including all the main physical quantities of robots, including the end-effector and joint, position and velocity, base movement, etc. |
|
|
|
### Getting Started |
|
|
|
RDT-1B supports finetuning on custom dataset, deploying and inferencing on real-robots, as well as pretraining the model. |
|
|
|
Please refer to [our repository](https://github.com/GeneralEmbodiedSystem/RoboticsDiffusionTransformer/blob/main/docs/pretrain.md) for all the above guides. |
|
|
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
|
|
[More Information Needed] |