Update README.md
Browse files
README.md
CHANGED
|
@@ -16,6 +16,8 @@ This is a merge of pre-trained language models created using [mergekit](https://
|
|
| 16 |
|
| 17 |
# First benchmarks
|
| 18 |
|
|
|
|
|
|
|
| 19 |
| Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
|
| 20 |
|------------------------------------|-----------------------------------|--------------------------------|
|
| 21 |
| arc_challenge 0 shot | 47.95 | **51.62** |
|
|
@@ -41,6 +43,7 @@ This is a merge of pre-trained language models created using [mergekit](https://
|
|
| 41 |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | 34,9 |
|
| 42 |
| openbmb/MiniCPM-2B-dpo-bf16 | 44,28 |
|
| 43 |
| microsoft/bitnet-b1.58-2B-4T-bf16 (base model) | 47,95 |
|
|
|
|
| 44 |
| jpacifico/bitnet-dpo-merged-modelstock7 | **51,62** |
|
| 45 |
|
| 46 |
|
|
|
|
| 16 |
|
| 17 |
# First benchmarks
|
| 18 |
|
| 19 |
+
**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit baseline.
|
| 20 |
+
|
| 21 |
| Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
|
| 22 |
|------------------------------------|-----------------------------------|--------------------------------|
|
| 23 |
| arc_challenge 0 shot | 47.95 | **51.62** |
|
|
|
|
| 43 |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | 34,9 |
|
| 44 |
| openbmb/MiniCPM-2B-dpo-bf16 | 44,28 |
|
| 45 |
| microsoft/bitnet-b1.58-2B-4T-bf16 (base model) | 47,95 |
|
| 46 |
+
| microsoft/bitnet-b1.58-2B-4T | 49,91 |
|
| 47 |
| jpacifico/bitnet-dpo-merged-modelstock7 | **51,62** |
|
| 48 |
|
| 49 |
|