act1 / README.md

Upload README.md with huggingface_hub

6a94aca verified 7 months ago

4.5 kB

	---
	license: apache-2.0
	library_name: lerobot
	pipeline_tag: robotics
	tags:
	- robotics
	- lerobot
	- act
	- imitation-learning
	- so101
	model_name: act
	datasets: r2owb0/so101-DS1
	base_model: lerobot/smolvla_base
	---

	# ACT Model for SO101 Robot

	This is an Action Chunking Transformer (ACT) model trained for the SO101 robot using LeRobot. The model was trained on demonstration data collected from teleoperation sessions.

	## Model Details

	### Architecture
	- Model Type: Action Chunking Transformer (ACT)
	- Vision Backbone: ResNet18 with ImageNet pretrained weights
	- Transformer Configuration:
	- Hidden dimension: 512
	- Number of heads: 8
	- Encoder layers: 4
	- Decoder layers: 1
	- Feedforward dimension: 3200
	- VAE: Enabled with 32-dimensional latent space
	- Chunk Size: 50 steps
	- Action Steps: 15 steps per inference

	### Camera Setup
	The model uses a dual-camera setup for robust perception:

	1. Wrist Camera (`observation.images.wrist`):
	- Resolution: 240×320 pixels
	- Position: Mounted on the robot's wrist
	- Purpose: Provides close-up, detailed view of manipulation tasks
	- Field of view: Narrow, focused on the immediate workspace

	2. Top Camera (`observation.images.top`):
	- Resolution: 480×640 pixels
	- Position: Mounted above the workspace
	- Purpose: Provides broader context and overview of the environment
	- Field of view: Wide, captures the entire workspace

	### Input/Output Specifications

	Inputs:
	- Robot State: 6-dimensional joint positions
	- `shoulder_pan.pos`
	- `shoulder_lift.pos`
	- `elbow_flex.pos`
	- `wrist_flex.pos`
	- `wrist_roll.pos`
	- `gripper.pos`
	- Wrist Camera: RGB image (240×320×3)
	- Top Camera: RGB image (480×640×3)

	Outputs:
	- Actions: 6-dimensional joint commands (same structure as state)

	## Training Details

	### Dataset
	- Source: `r2owb0/so101-DS1`
	- Episodes: 10 demonstration episodes
	- Total Frames: 5,990 frames
	- Frame Rate: 30 FPS
	- Robot Type: SO101 follower robot

	### Training Configuration
	- Training Steps: 25,000
	- Batch Size: 4
	- Learning Rate: 1e-5
	- Optimizer: AdamW with weight decay 1e-4
	- Validation Split: 10% of episodes
	- Seed: 1000

	### Data Augmentation
	The model was trained with comprehensive image augmentation:
	- Brightness adjustment (0.8-1.2x)
	- Contrast adjustment (0.8-1.2x)
	- Saturation adjustment (0.5-1.5x)
	- Hue adjustment (±0.05)
	- Sharpness adjustment (0.5-1.5x)

	## Usage

	### Installation
	```bash
	pip install lerobot
	```

	### Loading the Model
	```python
	from lerobot.policies import ACTPolicy
	from lerobot.configs.policies import ACTConfig

	# Load the model
	policy = ACTPolicy.from_pretrained("r2owb0/act1")
	```

	### Evaluation
	```bash
	lerobot-eval \
	--policy.path=r2owb0/act1 \
	--env.type=your_env_type \
	--eval.n_episodes=10 \
	--eval.batch_size=10
	```

	### Inference
	```python
	import torch

	# Prepare observation
	observation = {
	"observation.state": torch.tensor([...]), # 6D robot state
	"observation.images.wrist": torch.tensor([...]), # 240x320x3 RGB
	"observation.images.top": torch.tensor([...]) # 480x640x3 RGB
	}

	# Get action
	with torch.no_grad():
	action = policy.select_action(observation)
	```

	## Hardware Requirements

	### Robot Setup
	- Robot: SO101 follower robot
	- Cameras:
	- Wrist-mounted camera (240×320 resolution)
	- Top-mounted camera (480×640 resolution)
	- Control: 6-DOF arm with gripper

	### Computing Requirements
	- GPU: CUDA-compatible GPU recommended
	- Memory: At least 4GB GPU memory
	- Storage: ~200MB for model weights

	## Performance Notes

	- The model uses action chunking, predicting 50 steps ahead but executing 15 steps at a time
	- Temporal ensembling is disabled for real-time inference
	- The model expects normalized inputs (mean/std normalization)
	- VAE is enabled for better representation learning

	## Limitations

	- Trained on a specific robot configuration (SO101)
	- Requires the exact camera setup described above
	- Performance may vary with different lighting conditions
	- Limited to the task domain covered in the training dataset

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{r2owb0_act1,
	author = {Robert},
	title = {ACT Model for SO101 Robot},
	year = {2024},
	publisher = {Hugging Face},
	url = {https://huggingface.co/r2owb0/act1}
	}
	```

	## License

	This model is licensed under the Apache 2.0 License.