GR00T-N1.6-G1-PnPAppleToPlate-CW

Community fine-tune of GR00T N1.6 on Unitree G1 — 50% closed-loop success on PnPAppleToPlate.

Context

Several community members have reported difficulty reproducing the expected results with the published G1 checkpoints (see also #470, #543, #558). We ran into the same issue, so we fine-tuned from nvidia/GR00T-N1.6-3B on the official sim data using NVIDIA's pipeline and are sharing the result here in case it helps others.

Results

Metric	Value
Success rate	5/10 (50%)
Hands activate	Yes
Arm reach	Both arms reach toward apple
Grasp + place	Yes (in successful episodes)

Per-episode results (10 episodes): [F, F, F, T, T, F, F, T, T, T]

Eval time: 370s total (~37s per episode).

Reproducing Our Results

Prerequisites

Isaac-GR00T — Tested at commit 7d5a455 (main branch, March 2026). Community reports the N1.6 Release tag also works.

git clone https://github.com/NVIDIA-Omniverse/Isaac-GR00T.git
cd Isaac-GR00T
git checkout 7d5a455  # or use main

Python environment — Follow the official install. Isaac-GR00T uses uv for dependency management.
Download the dataset (required for fine-tuning, not for eval):

GIT_LFS_SKIP_SMUDGE=1 git clone --no-checkout \
    https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim
cd PhysicalAI-Robotics-GR00T-X-Embodiment-Sim
git sparse-checkout init --cone
git sparse-checkout set unitree_g1.LMPnPAppleToPlateDC
git checkout main
git lfs pull --include="unitree_g1.LMPnPAppleToPlateDC/**"

This downloads ~500 MB (102 sim episodes).

Closed-Loop Evaluation (2 terminals)

Terminal 1 — MuJoCo simulation + Decoupled WBC:

cd Isaac-GR00T/external_dependencies/GR00T-WholeBodyControl

MUJOCO_GL=egl PYOPENGL_PLATFORM=egl \
python -m decoupled_wbc.deployment.run_sim_loop \
    --config decoupled_wbc/control/configs/g1_locomotion_decoupled_wbc.yaml \
    --custom_controller_cfg '{"wbc_type": "gear_wbc"}' \
    --task_name LMPnPAppleToPlateDC_G1

Terminal 2 — GR00T server + eval rollout:

cd Isaac-GR00T

# Start the policy server
MUJOCO_GL=egl PYOPENGL_PLATFORM=egl \
uv run --extra=gpu python gr00t/eval/run_gr00t_server.py \
    --model-path pedroset/GR00T-N1.6-G1-PnPAppleToPlate-CW \
    --embodiment-tag UNITREE_G1 \
    --use-sim-policy-wrapper \
    --device cuda:0 \
    --host 0.0.0.0 \
    --port 5555

Then, in a third terminal (or after the server is ready):

cd Isaac-GR00T

MUJOCO_GL=egl PYOPENGL_PLATFORM=egl \
uv run python gr00t/eval/rollout_policy.py \
    --n_episodes 10 \
    --max_episode_steps 1440 \
    --env_name gr00tlocomanip_g1_sim/LMPnPAppleToPlateDC_G1_gear_wbc \
    --policy_client_host 127.0.0.1 \
    --policy_client_port 5555 \
    --n_action_steps 20 \
    --n_envs 1

Note: If running headless (no display), MUJOCO_GL=egl PYOPENGL_PLATFORM=egl is required on all terminals. The eval script uses rollout_policy.py, NOT run_policy_server.py (which does not exist).

Fine-Tune It Yourself

Full training command with all hyperparameters exactly as used:

cd Isaac-GR00T

CUDA_VISIBLE_DEVICES=0 uv run python gr00t/experiment/launch_finetune.py \
    --base_model_path nvidia/GR00T-N1.6-3B \
    --dataset_path <PATH_TO>/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim/unitree_g1.LMPnPAppleToPlateDC \
    --embodiment_tag UNITREE_G1 \
    --num_gpus 1 \
    --output_dir output/g1_finetune \
    --max_steps 10000 \
    --save_steps 2000 \
    --save_total_limit 5 \
    --warmup_ratio 0.05 \
    --weight_decay 1e-5 \
    --learning_rate 1e-4 \
    --global_batch_size 32 \
    --dataloader_num_workers 4 \
    --color_jitter_params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08 \
    --use_wandb

Replace <PATH_TO> with your actual dataset path.

Do NOT add --tune_llm or --tune_visual. The defaults (false) are correct — only the diffusion head, projector, VLLN, and top 4 LLM layers are tuned.

If you get OOM, reduce --global_batch_size to 16. On a B200 (192 GB), batch size 128 also works.

Training takes ~2 hours on a single B200 or ~3-4 hours on an A100/H100 with batch 32.

Training Details

Parameter	Value
Base model	`nvidia/GR00T-N1.6-3B`
Dataset	`unitree_g1.LMPnPAppleToPlateDC` (102 episodes, sim)
Embodiment	`UNITREE_G1` (29 DOF)
Steps	10,000
Batch size	32 (global)
Learning rate	1e-4 (cosine schedule)
Warmup	5% of steps (500 steps)
Weight decay	1e-5
Optimizer	AdamW
Precision	bf16
DeepSpeed	Stage 2
Max grad norm	1.0
Action horizon	16 steps
Inference timesteps	4 (flow matching)
Action representation	Relative (arms), Absolute (hands, waist, navigation)
Hardware	1× NVIDIA B200 (~2 hours)

What's tuned: Diffusion action head (tune_diffusion_model: true), projector (tune_projector: true), VLLN (tune_vlln: true), top 4 LLM layers (tune_top_llm_layers: 4).

What's frozen: Vision backbone (tune_visual: false), full LLM body (tune_llm: false).

Loss Curve

Step       Loss
──────────────────
    10     1.197
   100     1.050
   500     0.173
  1000     0.070
  2000     0.053
  4000     0.048
  6000     0.031
  8000     0.020
 10000     0.021
  Min:     0.014 (step 9930)

57× reduction (1.197 → 0.021). Monotonic descent, no spikes, no early plateau. Full training logs available in trainer_state.json inside this checkpoint.

Known Issues

flash-attn on B200/Blackwell GPUs

Isaac-GR00T pins flash-attn==2.7.4.post1 which does not support Blackwell (SM120). If training on B200/B100, install a newer version:

pip install flash-attn>=2.8.2 --no-build-isolation

This is not needed for Ampere (A100) or Hopper (H100) GPUs.

Dataset must be downloaded separately

The training dataset is NOT bundled with this checkpoint. Download it from HuggingFace (see Prerequisites above). The 102 episodes are ~500 MB.

embodiment_tag casing

Use UNITREE_G1 (uppercase) for the --embodiment-tag flag. The dataset uses lowercase unitree_g1 internally, but the CLI flag expects uppercase. Getting this wrong causes a KeyError.

Eval hyperparameters matter

--n_action_steps 20 and --max_episode_steps 1440 are critical. Different values will give different success rates. These match NVIDIA's official eval config.

Limitations

Sim-only: Trained and evaluated entirely in MuJoCo simulation. Not tested on real hardware.
Small dataset: 102 sim episodes — possible overfitting to the specific sim environment. Mask-guided augmentation (PR #521) may improve generalization.
Single task: PnPAppleToPlate only. No multi-task evaluation.
10 eval episodes: Limited statistical significance. We recommend 20+ episodes for robust comparison.
No video evidence hosted: We have success videos locally but have not uploaded them yet.

Checkpoint Contents

checkpoint-10000/
├── model-00001-of-00002.safetensors    (4.8 GB)
├── model-00002-of-00002.safetensors    (4.5 GB)
├── model.safetensors.index.json
├── config.json
├── processor_config.json
├── statistics.json
├── embodiment_id.json
├── wandb_config.json
├── trainer_state.json
└── experiment_cfg/
    ├── conf.yaml                        ← full training config
    ├── config.yaml
    ├── final_processor_config.json
    ├── final_model_config.json
    └── dataset_statistics.json

Total size: ~9.2 GB (inference-only, optimizer state excluded).

Acknowledgments

NVIDIA Isaac-GR00T — base model, training pipeline, eval framework
NVIDIA GEAR-SONIC — whole-body control for Decoupled WBC
Fine-tuned by CloudWalk as part of the SONIC project

Citation

If you use this checkpoint, please cite the original GR00T and SONIC papers:

@article{bjorck2025gr00t,
  title={GR00T N1: An Open Foundation Model for Generalist Humanoid Robots},
  author={Bjorck, Johan and others},
  journal={arXiv preprint arXiv:2503.14734},
  year={2025}
}

@article{luo2025sonic,
  title={SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control},
  author={Luo, Zhengyi and Yuan, Ye and Wang, Tingwu and Li, Chenran and Chen, Sirui and others},
  journal={arXiv preprint arXiv:2511.07820},
  year={2025}
}