jpacifico commited on
Commit
d16c030
·
verified ·
1 Parent(s): 279bb9e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -21,11 +21,11 @@ language:
21
  **Aramis-2B-BitNet** *(2.41B params / Context Length: Maximum sequence length of 4096 tokens)*
22
  A compact, agent-oriented small language model focused on language understanding and contextual decision-making.
23
  Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
24
- Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
25
 
26
  **Why BitNet (and why this model)**
27
  - BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs. For more details on the underlying architecture and efficiency of BitNet, please refer to the official Microsoft Research publication: [BitNet b1.58 2B4T Technical Report](https://arxiv.org/abs/2504.12285)
28
- - ModelStock7 demonstrates that a 2B BitNet can deliver SOTA language understanding in its class without sacrificing efficiency.
29
 
30
  **Model Variants**
31
 
 
21
  **Aramis-2B-BitNet** *(2.41B params / Context Length: Maximum sequence length of 4096 tokens)*
22
  A compact, agent-oriented small language model focused on language understanding and contextual decision-making.
23
  Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
24
+ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoint.
25
 
26
  **Why BitNet (and why this model)**
27
  - BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs. For more details on the underlying architecture and efficiency of BitNet, please refer to the official Microsoft Research publication: [BitNet b1.58 2B4T Technical Report](https://arxiv.org/abs/2504.12285)
28
+ - Aramis demonstrates that a 2B BitNet can deliver SOTA language understanding in its class without sacrificing efficiency.
29
 
30
  **Model Variants**
31