This is a GRPO trained version of my Control nanuq model to fuck around with GRPO training. This model is highly experimental - It's *supposed to do reasoning in XML tags however it doesn't do it for some reason, Possibly i need to train for more epochs

Trained on 1xA100 80gb provided by Lucyknada, Trained with Unsloth, If your trying to replicate the model, One - Don't. Two - Swap out the default L3.1 8B colab with control nanuq

Downloads last month
17
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for NewEden-Forge/Control-Nanuq-GRPO

Finetuned
(1)
this model
Quantizations
1 model