jpacifico
/

Aramis-2B-BitNet-bf16

Text Generation

Model card Files Files and versions

jpacifico commited on Aug 25

Commit

d3d1e24

·

verified ·

1 Parent(s): 1d9a6c4

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ language:
 **Aramis-2B-BitNet** *(2.41B params / Context Length: Maximum sequence length of 4096 tokens)*
 A compact, agent-oriented small language model focused on contextual reasoning, language understanding and multi-turn instruction following.
-Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
 Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoint.
 **Why BitNet (and why this model)**
@@ -69,7 +69,7 @@ All scores are reported in comparison with the original [microsoft/bitnet-b1.58-
 | mmlu 5 shot acc                    | 52.96                             | **53.39**                      |
 | commonsense_qa 10 shot acc         | **71.17**                         | 70.76                          |
-**ARC-Challenge:** 51.62 (First-ever ≥50 score for a model in the 2B category, i.e., >1.5B and <2.5B params)
 | Model                                              | arc_challenge (0 shot) |
 |----------------------------------------------------|------------------------|

 **Aramis-2B-BitNet** *(2.41B params / Context Length: Maximum sequence length of 4096 tokens)*
 A compact, agent-oriented small language model focused on contextual reasoning, language understanding and multi-turn instruction following.
+Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
 Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoint.
 **Why BitNet (and why this model)**
 | mmlu 5 shot acc                    | 52.96                             | **53.39**                      |
 | commonsense_qa 10 shot acc         | **71.17**                         | 70.76                          |
+**ARC-Challenge (zero-shot):** 51.62 — first-ever ≥50 reported for a 2B-class model (>1.5B, <2.5B) *based on publicly available results*.
 | Model                                              | arc_challenge (0 shot) |
 |----------------------------------------------------|------------------------|