Gemma 4 31B Claude Opus Reasoning
Full parameter fine-tune of google/gemma-4-31B-it on 12,680 Claude Opus 4.6 reasoning traces.
First full-parameter fine-tune of Gemma 4 31B.
Highlights
- 89.7% token accuracy after 4 epochs
- Full parameter SFT on 8x NVIDIA H200 — all 31B parameters updated, not LoRA
- 12,680 pure Claude Opus 4.6 traces — consistent reasoning style, no mixed-model data
- Native Gemma 4 thinking format — uses built-in thinking tokens
- Runs on a 4090 at Q4_K_M (~17GB VRAM)
Training
| Base | google/gemma-4-31B-it |
| Method | Full parameter SFT (not LoRA) |
| Framework | TRL SFTTrainer + PyTorch FSDP |
| Hardware | 8x NVIDIA H200 (141GB each) |
| Precision | bf16 |
| Total epochs | 4 (2 at lr=1e-5, then 2 more at lr=5e-6) |
| Sequence length | 8,192 |
| Batch size (effective) | 10 |
Training Schedule
Two-phase approach for optimal convergence:
| Phase | Epochs | Learning rate | Result |
|---|---|---|---|
| Initial | 2 | 1e-5 (cosine) | 80.8% accuracy |
| Continued | 2 | 5e-6 (cosine) | 89.7% accuracy |
Continuing at lower LR on a warm checkpoint improved accuracy by 9 percentage points.
Training Metrics
| Metric | After phase 1 | After phase 2 (final) |
|---|---|---|
| Loss | 27.5 | 13.6 |
| Token accuracy | 80.8% | 89.7% |
| Grad norm | 15.3 | 15.3 |
| Entropy | 0.69 | 0.34 |
Training Data (~12,680 samples)
All Claude Opus 4.6. No mixed-model data.
| Dataset | Samples | Description |
|---|---|---|
| Crownelius/Opus-4.6-Reasoning-3300x | 2,160 | Cleaned Claude Opus 4.6 reasoning — math, code, diverse |
| TeichAI/Claude-Opus-4.6-Reasoning-887x | 887 | Tool-use reasoning + vague prompt handling |
| Roman1111111/claude-opus-4.6-10000x | 9,633 | Math/logic reasoning with verified solutions |
Usage
from transformers import AutoProcessor, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"EganAI/gemma4-31b-opus-reasoning",
torch_dtype="auto",
device_map="auto",
)
processor = AutoProcessor.from_pretrained("EganAI/gemma4-31b-opus-reasoning")
messages = [
{"role": "user", "content": "Prove that the square root of 2 is irrational."},
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
inputs = processor(text=text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs, max_new_tokens=2048, temperature=1.0, top_p=0.95, top_k=64
)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=False))
Hardware Requirements
| Format | VRAM | Device |
|---|---|---|
| bf16 | ~62GB | 1x A100/H100 80GB |
| Q8 | ~31GB | 2x RTX 4090 |
| Q4_K_M | ~17GB | RTX 4090 |
| Q3_K_M | ~14GB | RTX 4080 |
Implementation Notes
- Gemma 4 requires mm_token_type_ids even for text-only training — custom data collator injects zeros
- SDPA attention only — flash attention is incompatible with Gemma's soft-capping
- FSDP over DeepSpeed — simpler config for day-zero model support
Related Models
- EganAI/gemma-4-31B-Terminal-Agent — Stage 2 model: terminal/coding agent built on this checkpoint (coming soon)
- google/gemma-4-31B-it — base model
- Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled — similar approach on Qwen3.5
License
Apache 2.0 (same as Gemma 4)
- Downloads last month
- 2,912
