--- license: apache-2.0 datasets: - ai2-adapt-dev/rlvr_gsm8k_zs metrics: - accuracy base_model: - swiss-ai/Apertus-8B-Instruct-2509 pipeline_tag: reinforcement-learning tags: - rlvr - grpo - gsm8k - apertus results: - task: type: text-generation dataset: name: gsm8k type: mathematical split: test metrics: - name: GSM8K (Validation, 0-shot, T=0) Accuracy type: GSM8K (0-shot, T=0) value: 66.23 --- # RLVR Training Apertus 8B with GRPO on GSM8K dataset ## Results
Validation accuracy improved from 46.41% to 66.23%.
Training performed on a GPU node with 4× NVIDIA H100 (95 GB), running for approximately 5 hours.
---| Rollouts | |
|---|---|
num_unique_prompts_rollout | 32 |
num_samples_per_prompt_rollout | 8 |
temperature | 0.8 |
| Optimization | |
learning_rate | 3.0e-7 |
beta | 0.01 |
This work builds upon and was inspired by the following contributions: