Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
tsamtsam
/
outputs
like
0
Transformers
Safetensors
Generated from Trainer
unsloth
trl
grpo
arxiv:
2402.03300
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
56a9506
outputs
/
training_args.bin
Commit History
tsamtsam/gemma-grpo-math
56a9506
verified
tsamtsam
commited on
Mar 21
End of training
8e82671
verified
tsamtsam
commited on
Mar 20
End of training
f50d25f
verified
tsamtsam
commited on
Mar 20
End of training
68dbf6d
verified
tsamtsam
commited on
Mar 20