|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- robotics |
|
|
- pi-zero |
|
|
- diffusion |
|
|
- vision-language-action |
|
|
- aloha |
|
|
- manipulation |
|
|
- bolt-nut-sorting |
|
|
base_model: google/paligemma-3b-pt-224 |
|
|
library_name: openpi |
|
|
pipeline_tag: robotics |
|
|
--- |
|
|
|
|
|
# Pi-0 Bolt Nut Sort Model |
|
|
|
|
|
This is a Pi-0 (Pi-Zero) model trained for bolt and nut sorting tasks using the OpenPI framework. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Architecture**: Pi-0 (diffusion-based vision-language-action model) |
|
|
- **Base Model**: PaLiGemma 3B with SigLIP vision encoder |
|
|
- **Task**: Sorting bolts and nuts into separate baskets |
|
|
- **Robot**: Dual-arm ALOHA setup |
|
|
- **Action Space**: 14-DoF (7 per arm: 6 joints + 1 gripper) |
|
|
- **Training Steps**: 29,999 |
|
|
- **Action Horizon**: 50 steps |
|
|
- **Image Resolution**: 224x224 |
|
|
|
|
|
## Dataset |
|
|
|
|
|
Trained on the `naungth/pi0_bolt_nut_sort` dataset with the task instruction: |
|
|
"sort the bolts and the nuts into separate baskets" |
|
|
|
|
|
## Usage |
|
|
|
|
|
### With OpenPI |
|
|
|
|
|
```python |
|
|
from openpi.policies import policy_config |
|
|
from openpi.training import config |
|
|
|
|
|
# Load the model configuration |
|
|
config_name = "pi0_bns" |
|
|
train_config = config.get_config(config_name) |
|
|
|
|
|
# Create policy from your local checkpoint |
|
|
policy = policy_config.create_trained_policy( |
|
|
train_config, |
|
|
"path/to/checkpoint", |
|
|
default_prompt="sort the bolts and the nuts into separate baskets" |
|
|
) |
|
|
|
|
|
# Use for inference |
|
|
observation = { |
|
|
"images": { |
|
|
"cam_high": image_array, # [H, W, 3] uint8 |
|
|
"cam_left_wrist": left_wrist_image, # [H, W, 3] uint8 |
|
|
"cam_right_wrist": right_wrist_image, # [H, W, 3] uint8 |
|
|
}, |
|
|
"state": joint_positions, # [14] float32 |
|
|
"prompt": "sort the bolts and the nuts into separate baskets" |
|
|
} |
|
|
|
|
|
actions = policy.infer(observation)["actions"] # [50, 14] |
|
|
``` |
|
|
|
|
|
### With Policy Server |
|
|
|
|
|
```bash |
|
|
# Start the policy server |
|
|
uv run scripts/serve_policy.py policy:checkpoint --policy.config=pi0_bns --policy.dir=path/to/checkpoint |
|
|
|
|
|
# Use with client |
|
|
from openpi_client import websocket_client_policy |
|
|
client = websocket_client_policy.WebsocketClientPolicy("localhost", 8000) |
|
|
actions = client.infer(observation) |
|
|
``` |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
- **Vision Encoder**: SigLIP-So400m/14 |
|
|
- **Language Model**: Gemma 2B + Gemma 300M (action expert) |
|
|
- **Training**: Diffusion-based action prediction |
|
|
- **Input**: Multi-camera RGB + proprioception + language instruction |
|
|
- **Output**: Future action sequence (50 timesteps) |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Framework**: JAX/Flax with OpenPI |
|
|
- **Optimizer**: AdamW |
|
|
- **Base Checkpoint**: Pi-0 base model from Google |
|
|
- **Fine-tuning**: Task-specific fine-tuning on bolt nut sort data |
|
|
- **Normalization**: Dataset-specific state/action normalization |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{pi0, |
|
|
title={Pi-Zero: A Diffusion-Based Policy for Robot Manipulation}, |
|
|
author={TODO: Add authors}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Built using the [OpenPI](https://github.com/google-deepmind/openpi) framework |
|
|
- Based on the Pi-0 architecture |
|
|
- Training data from bolt nut sorting demonstrations |
|
|
|