--- license: apache-2.0 library_name: transformers tags: - language - unsloth - granite-4.0 base_model: - ibm-granite/granite-4.0-h-1b ---
Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.
| Benchmarks | Metric | 350M Dense | H 350M Dense | 1B Dense | H 1B Dense | 
|---|---|---|---|---|---|
| General Tasks | |||||
| MMLU | 5-shot | 35.01 | 36.21 | 59.39 | 59.74 | 
| MMLU-Pro | 5-shot, CoT | 12.13 | 14.38 | 34.02 | 32.86 | 
| BBH | 3-shot, CoT | 33.07 | 33.28 | 60.37 | 59.68 | 
| AGI EVAL | 0-shot, CoT | 26.22 | 29.61 | 49.22 | 52.44 | 
| GPQA | 0-shot, CoT | 24.11 | 26.12 | 29.91 | 29.69 | 
| Alignment Tasks | |||||
| IFEval | Instruct, Strict | 61.63 | 67.63 | 80.82 | 82.37 | 
| IFEval | Prompt, Strict | 49.17 | 55.64 | 73.94 | 74.68 | 
| IFEval | Average | 55.4 | 61.63 | 77.38 | 78.53 | 
| Math Tasks | |||||
| GSM8K | 8-shot | 30.71 | 39.27 | 76.35 | 69.83 | 
| GSM Symbolic | 8-shot | 26.76 | 33.7 | 72.3 | 65.72 | 
| Minerva Math | 0-shot, CoT | 13.04 | 5.76 | 45.28 | 49.4 | 
| DeepMind Math | 0-shot, CoT | 8.45 | 6.2 | 34 | 34.98 | 
| Code Tasks | |||||
| HumanEval | pass@1 | 39 | 38 | 74 | 73 | 
| HumanEval+ | pass@1 | 37 | 35 | 69 | 68 | 
| MBPP | pass@1 | 48 | 49 | 65 | 69 | 
| MBPP+ | pass@1 | 38 | 44 | 57 | 60 | 
| CRUXEval-O | pass@1 | 23.75 | 25.5 | 33.13 | 36 | 
| BigCodeBench | pass@1 | 11.14 | 11.23 | 30.18 | 29.12 | 
| Tool Calling Tasks | |||||
| BFCL v3 | 39.32 | 43.32 | 54.82 | 50.21 | |
| Multilingual Tasks | |||||
| MULTIPLE | pass@1 | 15.99 | 14.31 | 32.24 | 36.11 | 
| MMMLU | 5-shot | 28.23 | 27.95 | 45 | 49.43 | 
| INCLUDE | 5-shot | 27.74 | 27.09 | 42.12 | 43.35 | 
| MGSM | 8-shot | 14.72 | 16.16 | 37.84 | 27.52 | 
| Safety | |||||
| SALAD-Bench | 97.12 | 96.55 | 93.44 | 96.4 | |
| AttaQ | 82.53 | 81.76 | 85.26 | 82.85 | |
| Benchmarks | # Langs | Languages | 
|---|---|---|
| MMMLU | 11 | ar, de, en, es, fr, ja, ko, pt, zh, bn, hi | 
| INCLUDE | 14 | hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh | 
| MGSM | 5 | en, es, fr, ja, zh | 
| Model | 350M Dense | H 350M Dense | 1B Dense | H 1B Dense | 
|---|---|---|---|---|
| Embedding size | 1024 | 768 | 2048 | 1536 | 
| Number of layers | 28 attention | 4 attention / 28 Mamba2 | 40 attention | 4 attention / 36 Mamba2 | 
| Attention head size | 64 | 64 | 128 | 128 | 
| Number of attention heads | 16 | 12 | 16 | 12 | 
| Number of KV heads | 4 | 4 | 4 | 4 | 
| Mamba2 state size | - | 128 | - | 128 | 
| Number of Mamba2 heads | - | 48 | - | 48 | 
| MLP / Shared expert hidden size | 2048 | 2048 | 4096 | 4096 | 
| Num. Experts | - | - | - | - | 
| Num. active Experts | - | - | - | - | 
| Expert hidden size | - | - | - | - | 
| MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU | 
| Sequence length | 32K | 32K | 128K | 128K | 
| Position embedding | RoPE | NoPE | RoPE | NoPE | 
| # Parameters | 350M | 340M | 1.6B | 1.5B | 
| # Active parameters | 350M | 340M | 1.6B | 1.5B |