GR00T-N1.6-G1-PnPAppleToPlate-CW
Community fine-tune of GR00T N1.6 on Unitree G1 β 50% closed-loop success on PnPAppleToPlate.
Context
Several community members have reported difficulty reproducing the expected results with the published G1 checkpoints (see also #470, #543, #558). We ran into the same issue, so we fine-tuned from nvidia/GR00T-N1.6-3B on the official sim data using NVIDIA's pipeline and are sharing the result here in case it helps others.
Results
| Metric | Value |
|---|---|
| Success rate | 5/10 (50%) |
| Hands activate | Yes |
| Arm reach | Both arms reach toward apple |
| Grasp + place | Yes (in successful episodes) |
Per-episode results (10 episodes): [F, F, F, T, T, F, F, T, T, T]
Eval time: 370s total (~37s per episode).
Reproducing Our Results
Prerequisites
- Isaac-GR00T β Tested at commit
7d5a455(main branch, March 2026). Community reports theN1.6 Releasetag also works.
git clone https://github.com/NVIDIA-Omniverse/Isaac-GR00T.git
cd Isaac-GR00T
git checkout 7d5a455 # or use main
Python environment β Follow the official install. Isaac-GR00T uses
uvfor dependency management.Download the dataset (required for fine-tuning, not for eval):
GIT_LFS_SKIP_SMUDGE=1 git clone --no-checkout \
https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim
cd PhysicalAI-Robotics-GR00T-X-Embodiment-Sim
git sparse-checkout init --cone
git sparse-checkout set unitree_g1.LMPnPAppleToPlateDC
git checkout main
git lfs pull --include="unitree_g1.LMPnPAppleToPlateDC/**"
This downloads ~500 MB (102 sim episodes).
Closed-Loop Evaluation (2 terminals)
Terminal 1 β MuJoCo simulation + Decoupled WBC:
cd Isaac-GR00T/external_dependencies/GR00T-WholeBodyControl
MUJOCO_GL=egl PYOPENGL_PLATFORM=egl \
python -m decoupled_wbc.deployment.run_sim_loop \
--config decoupled_wbc/control/configs/g1_locomotion_decoupled_wbc.yaml \
--custom_controller_cfg '{"wbc_type": "gear_wbc"}' \
--task_name LMPnPAppleToPlateDC_G1
Terminal 2 β GR00T server + eval rollout:
cd Isaac-GR00T
# Start the policy server
MUJOCO_GL=egl PYOPENGL_PLATFORM=egl \
uv run --extra=gpu python gr00t/eval/run_gr00t_server.py \
--model-path pedroset/GR00T-N1.6-G1-PnPAppleToPlate-CW \
--embodiment-tag UNITREE_G1 \
--use-sim-policy-wrapper \
--device cuda:0 \
--host 0.0.0.0 \
--port 5555
Then, in a third terminal (or after the server is ready):
cd Isaac-GR00T
MUJOCO_GL=egl PYOPENGL_PLATFORM=egl \
uv run python gr00t/eval/rollout_policy.py \
--n_episodes 10 \
--max_episode_steps 1440 \
--env_name gr00tlocomanip_g1_sim/LMPnPAppleToPlateDC_G1_gear_wbc \
--policy_client_host 127.0.0.1 \
--policy_client_port 5555 \
--n_action_steps 20 \
--n_envs 1
Note: If running headless (no display),
MUJOCO_GL=egl PYOPENGL_PLATFORM=eglis required on all terminals. The eval script usesrollout_policy.py, NOTrun_policy_server.py(which does not exist).
Fine-Tune It Yourself
Full training command with all hyperparameters exactly as used:
cd Isaac-GR00T
CUDA_VISIBLE_DEVICES=0 uv run python gr00t/experiment/launch_finetune.py \
--base_model_path nvidia/GR00T-N1.6-3B \
--dataset_path <PATH_TO>/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim/unitree_g1.LMPnPAppleToPlateDC \
--embodiment_tag UNITREE_G1 \
--num_gpus 1 \
--output_dir output/g1_finetune \
--max_steps 10000 \
--save_steps 2000 \
--save_total_limit 5 \
--warmup_ratio 0.05 \
--weight_decay 1e-5 \
--learning_rate 1e-4 \
--global_batch_size 32 \
--dataloader_num_workers 4 \
--color_jitter_params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08 \
--use_wandb
Replace
<PATH_TO>with your actual dataset path.Do NOT add
--tune_llmor--tune_visual. The defaults (false) are correct β only the diffusion head, projector, VLLN, and top 4 LLM layers are tuned.If you get OOM, reduce
--global_batch_sizeto 16. On a B200 (192 GB), batch size 128 also works.
Training takes ~2 hours on a single B200 or ~3-4 hours on an A100/H100 with batch 32.
Training Details
| Parameter | Value |
|---|---|
| Base model | nvidia/GR00T-N1.6-3B |
| Dataset | unitree_g1.LMPnPAppleToPlateDC (102 episodes, sim) |
| Embodiment | UNITREE_G1 (29 DOF) |
| Steps | 10,000 |
| Batch size | 32 (global) |
| Learning rate | 1e-4 (cosine schedule) |
| Warmup | 5% of steps (500 steps) |
| Weight decay | 1e-5 |
| Optimizer | AdamW |
| Precision | bf16 |
| DeepSpeed | Stage 2 |
| Max grad norm | 1.0 |
| Action horizon | 16 steps |
| Inference timesteps | 4 (flow matching) |
| Action representation | Relative (arms), Absolute (hands, waist, navigation) |
| Hardware | 1Γ NVIDIA B200 (~2 hours) |
What's tuned: Diffusion action head (tune_diffusion_model: true), projector (tune_projector: true), VLLN (tune_vlln: true), top 4 LLM layers (tune_top_llm_layers: 4).
What's frozen: Vision backbone (tune_visual: false), full LLM body (tune_llm: false).
Loss Curve
Step Loss
ββββββββββββββββββ
10 1.197
100 1.050
500 0.173
1000 0.070
2000 0.053
4000 0.048
6000 0.031
8000 0.020
10000 0.021
Min: 0.014 (step 9930)
57Γ reduction (1.197 β 0.021). Monotonic descent, no spikes, no early plateau. Full training logs available in trainer_state.json inside this checkpoint.
Known Issues
flash-attn on B200/Blackwell GPUs
Isaac-GR00T pins flash-attn==2.7.4.post1 which does not support Blackwell (SM120). If training on B200/B100, install a newer version:
pip install flash-attn>=2.8.2 --no-build-isolation
This is not needed for Ampere (A100) or Hopper (H100) GPUs.
Dataset must be downloaded separately
The training dataset is NOT bundled with this checkpoint. Download it from HuggingFace (see Prerequisites above). The 102 episodes are ~500 MB.
embodiment_tag casing
Use UNITREE_G1 (uppercase) for the --embodiment-tag flag. The dataset uses lowercase unitree_g1 internally, but the CLI flag expects uppercase. Getting this wrong causes a KeyError.
Eval hyperparameters matter
--n_action_steps 20 and --max_episode_steps 1440 are critical. Different values will give different success rates. These match NVIDIA's official eval config.
Limitations
- Sim-only: Trained and evaluated entirely in MuJoCo simulation. Not tested on real hardware.
- Small dataset: 102 sim episodes β possible overfitting to the specific sim environment. Mask-guided augmentation (PR #521) may improve generalization.
- Single task: PnPAppleToPlate only. No multi-task evaluation.
- 10 eval episodes: Limited statistical significance. We recommend 20+ episodes for robust comparison.
- No video evidence hosted: We have success videos locally but have not uploaded them yet.
Checkpoint Contents
checkpoint-10000/
βββ model-00001-of-00002.safetensors (4.8 GB)
βββ model-00002-of-00002.safetensors (4.5 GB)
βββ model.safetensors.index.json
βββ config.json
βββ processor_config.json
βββ statistics.json
βββ embodiment_id.json
βββ wandb_config.json
βββ trainer_state.json
βββ experiment_cfg/
βββ conf.yaml β full training config
βββ config.yaml
βββ final_processor_config.json
βββ final_model_config.json
βββ dataset_statistics.json
Total size: ~9.2 GB (inference-only, optimizer state excluded).
Acknowledgments
- NVIDIA Isaac-GR00T β base model, training pipeline, eval framework
- NVIDIA GEAR-SONIC β whole-body control for Decoupled WBC
- Fine-tuned by CloudWalk as part of the SONIC project
Citation
If you use this checkpoint, please cite the original GR00T and SONIC papers:
@article{bjorck2025gr00t,
title={GR00T N1: An Open Foundation Model for Generalist Humanoid Robots},
author={Bjorck, Johan and others},
journal={arXiv preprint arXiv:2503.14734},
year={2025}
}
@article{luo2025sonic,
title={SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control},
author={Luo, Zhengyi and Yuan, Ye and Wang, Tingwu and Li, Chenran and Chen, Sirui and others},
journal={arXiv preprint arXiv:2511.07820},
year={2025}
}
- Downloads last month
- 91
Model tree for cloudwalk-research/GR00T-N1.6-G1-PnPAppleToPlate
Base model
nvidia/GR00T-N1.6-3BDataset used to train cloudwalk-research/GR00T-N1.6-G1-PnPAppleToPlate
Papers for cloudwalk-research/GR00T-N1.6-G1-PnPAppleToPlate
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Evaluation results
- Closed-Loop Success Rate on G1 PnPAppleToPlate (sim)self-reported50.000