openvla-libero-spatial-epoch-05-step-000685

OpenVLA模型在LIBERO-Spatial数据集上fine-tuned的checkpoint。

模型信息

Checkpoint: epoch-05-step-000685
Base Model: OpenVLA (Prismatic + DinoSigLIP-224px)
Training Dataset: LIBERO-Spatial (no noops)
Framework: Transformers

使用方法

from transformers import AutoModelForVision2Seq, AutoProcessor
import torch

# 加载模型
model = AutoModelForVision2Seq.from_pretrained(
    "yihannwang/openvla-libero-spatial-epoch-05-step-000685",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
).to("cuda")

# 加载processor
processor = AutoProcessor.from_pretrained(
    "yihannwang/openvla-libero-spatial-epoch-05-step-000685",
    trust_remote_code=True
)

# 预测动作
from PIL import Image

image = Image.open("observation.jpg")
prompt = "In: What action should the robot take to pick up the object?\nOut:"
inputs = processor(prompt, image).to("cuda", dtype=torch.bfloat16)

action = model.predict_action(**inputs, unnorm_key="libero_spatial_no_noops", do_sample=False)
print(action)  # 7-DoF action vector

评估

在LIBERO-Spatial任务上进行评估：

python experiments/robot/libero/run_libero_eval.py \
    --model_family openvla \
    --pretrained_checkpoint yihannwang/openvla-libero-spatial-epoch-05-step-000685 \
    --task_suite_name libero_spatial_no_noops \
    --center_crop False \
    --num_trials_per_task 50

Citation

@article{kim2024openvla,
  title={OpenVLA: An Open-Source Vision-Language-Action Model},
  author={Kim, Moo Jin and Pertsch, Karl and Karamcheti, Siddharth and Xiao, Ted and Balakrishna, Ashwin and Nair, Suraj and Rafailov, Rafael and Foster, Ethan and Lam, Grace and Sanketi, Pannag and Nasiriany, Soroush and Liang, Zheyuan and Sadigh, Dorsa and Levine, Sergey and Liang, Percy},
  journal={arXiv preprint arXiv:2406.09246},
  year={2024}
}

License

MIT License

Downloads last month: 20

Safetensors

Model size

8B params

Tensor type

BF16

Video Preview

Robotics