Update submission.json with dpo

#39
by robbiemu - opened

triângulo:

hf jobs uv run \
    --flavor a100-large \
    --timeout 3h \
    --secrets HF_TOKEN \
    dpo_training.py

eval:

hf jobs uv run \
    --flavor a10g-large \
    --timeout 2h \
    --with "lighteval[vllm]@git+https://github.com/huggingface/lighteval,emoji" \
    --secrets HF_TOKEN \
    lighteval vllm "model_name=robbiemu/smollm3-dpo-aligned" \
    "lighteval|gsm8k|0|0,leaderboard|truthfulqa:mc|0|0,leaderboard|hellaswag|0|0,leaderboard|arc:challenge|0|0" \
    --push-to-hub --results-org robbiemu

dpo_training.py is published with the model but is very, very similar to that from the exercise.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment