Model Card for so101_orange_pick_gr00tn1.5_model
NVIDIA Isaac GR00T N1.5 model fine-tuned on the so101_orange_pick dataset for experimenting with the robot arm LeRobot SO-101 in a dual-camera setup
The video preview has been generated from model training data in the dataset.
Model Details
Model Description
This model is a version of NVIDIA's GR00T N1.5 fine-tuned on the so101_orange_pick dataset. The model is relevant of the context of LightwheelAI's LeIsaac, where it models the standard task "LeIsaac-SO101-PickOrange-v0". In this task, a SO-ARM101 robo arm picks up 3 oranges from a table and places them in a bowl, one after another. The robot is equipped with a front and wrist camera.
- Developed by: Florian Roscheck, based on work of wantobcm and model authors
- License: Non-commercial use (See 3.2, NVIDIA license)
- Finetuned from model: NVIDIA GR00T-N1.5-3B
Uses
The model is intended for researchers and hobbyists who would like to experiment with LeRobot, GR00T inference, LeIsaac and NVIDIA's Isaac Sim.
How to Get Started with the Model
To learn more about how to set up the environment for using the model as an inference service, refer to the instructions in the Isaac-GR00T repo.
You can run the inference server for the model with the following command:
python scripts/inference_service.py --model-path flrs/so101_orange_pick_gr00tn1.5_model --server --embodiment-tag new_embodiment --data-config so100_dualcam
Training Details
Training Data
See the Dataset Card for more information on the training data.
All data in the dataset was used for training and no test data was withheld.
You can preview the dataset in the LeRobot Dataset Visualizer.
Training Procedure
Training was done on an NVIDIA L4 GPU, driver version 550.54.15, CUDA 12.4.
- Set up the environment as described in the Isaac-GR00T repo.
- Download the dataset from the Hugging Face Hub, e.g. via the Hugging Face CLI:
hf download --repo-type dataset --local-dir ./dataset wantobcm/so101_orange_pick_gr00tn1.5 - In order to train the model, you need a modality file. Create the file
dataset/meta/modality.jsonwith the following content:{ "state": { "single_arm": { "start": 0, "end": 5 }, "gripper": { "start": 5, "end": 6 } }, "action": { "single_arm": { "start": 0, "end": 5 }, "gripper": { "start": 5, "end": 6 } }, "video": { "front": { "original_key": "observation.images.front" }, "wrist": { "original_key": "observation.images.wrist" } }, "annotation": { "human.task_description": { "original_key": "task_index" } } } - Train (fine-tune) the model with the following command:
python scripts/gr00t_finetune.py \ --dataset-path ./dataset \ --num-gpus 1 \ --output-dir ./so101_orange_pick_gr00tn1.5_model \ --max-steps 6000 \ --data-config so100_dualcam \ --video-backend torchvision_av \ --no-tune_diffusion_model \ --dataloader-num-workers 1 \ --batch-size 16 \ --dataloader-prefetch-factor 1Note: The following adjustments were made to accommodate for infrastructure limitations:
--num-gpus 1--dataloader-num-workers 1--batch-size 16--dataloader-prefetch-factor 1--no-tune_diffusion_model
Training Hyperparameters
Default hyperparameters for GR00T were used (as of commit 1259d62), except for the ones set via command line arguments above.
Evaluation
Evaluation was run via the following GR00T evaluation script:
python scripts/eval_policy.py \
--model_path ./so101_orange_pick_gr00tn1.5_model \
--embodiment-tag new_embodiment \
--data-config so100_dualcam \
--dataset_path ./dataset \
--modality-keys single_arm gripper \
--trajs 40
Testing Data, Factors & Metrics
Testing Data
Evaluation of the model was done on the full training set, as no separate test set was withheld. For the training set, see dataset card linked above.
Metrics
In accordance with the output of the evaluation script, MSE (Mean Squared Error) for all trajectories were used.
Results
The MSE for all 40 trajectories was 23.752.
An example trajectory is visualized in the following image (image created via the evaluation script for trajectory 10):
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: NVIDIA L4, 4 vCPUs, 16 GB memory, 300 GB SDD
- Hours used: 3
- Cloud Provider: Google Cloud Platform (GCP)
- Compute Region: us-east1-c
- Carbon Emitted: 0.08 kg CO2eq (estimate via Machine Learning Impact calculator)
Model Card Contact
- Downloads last month
- 3
Model tree for flrs/so101_orange_pick_gr00tn1.5_model
Base model
nvidia/GR00T-N1.5-3BDataset used to train flrs/so101_orange_pick_gr00tn1.5_model
Evaluation results
- Train MSE on wantobcm/so101_orange_pick_gr00tn1.5self-reported23.752