Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		
					
		Running
		
	Update submission.json with dpo
#39
by
						
robbiemu
	
							
						- opened
							
					
triângulo:
hf jobs uv run \
    --flavor a100-large \
    --timeout 3h \
    --secrets HF_TOKEN \
    dpo_training.py
eval:
hf jobs uv run \
    --flavor a10g-large \
    --timeout 2h \
    --with "lighteval[vllm]@git+https://github.com/huggingface/lighteval,emoji" \
    --secrets HF_TOKEN \
    lighteval vllm "model_name=robbiemu/smollm3-dpo-aligned" \
    "lighteval|gsm8k|0|0,leaderboard|truthfulqa:mc|0|0,leaderboard|hellaswag|0|0,leaderboard|arc:challenge|0|0" \
    --push-to-hub --results-org robbiemu
dpo_training.py is published with the model but is very, very similar to that from the exercise.

