jpacifico
/

Aramis-2B-BitNet-bf16

@@ -18,7 +18,7 @@ language:
 ---
 # Model Summary
-**bitnet-dpo-merged-modelstock7** *(2.41B params / Context Length: Maximum sequence length of 4096 tokens)*
 A compact, agent-oriented small language model focused on language understanding and contextual decision-making.
 Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
 Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
@@ -29,8 +29,8 @@ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lo
 **Model Variants**
-- jpacifico/bitnet-dpo-merged-modelstock7 (this repo): Contains the retrainable weights in BF16 format
-- [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
@@ -49,14 +49,14 @@ Iterative DPO + Model merging :
 # First benchmarks
-**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
 All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
-| Benchmark (metric)                 | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
 |------------------------------------|-----------------------------------|--------------------------------|
 | arc_challenge 0 shot               | 47.95                             | **51.62**                      |
 | arc_easy 0 shot                    | 73.44                             | **75.25**                      |
-| hellaswag 0 shot                    | 68.27                             | **68.52**                     |
 | openbookqa 0 shot                  | **41.6**                          | 41.4                           |
 | boolq 0 shot                       | **79.39**                         | 79.33                          |
 | piqa 0 shot                        | **77.86**                         | 77.53                          |
@@ -79,7 +79,7 @@ All scores are reported in comparison with the original [microsoft/bitnet-b1.58-
 | openbmb/MiniCPM-2B-dpo-bf16                        | 44,28                  |
 | microsoft/bitnet-b1.58-2B-4T-bf16 (base model)     | 47,95                  |
 | microsoft/bitnet-b1.58-2B-4T                       | 49,91                  |
-| jpacifico/bitnet-dpo-merged-modelstock7            | **51,62**              |
 ### Reproducibility
@@ -89,7 +89,7 @@ The following example reproduces the **ARC-Challenge (0-shot)** evaluation for t
 ```bash
 HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
-  --model_args pretrained=jpacifico/modelstock7,dtype=bfloat16 \
   --tasks arc_challenge \
   --device cuda:0 --batch_size 8 \
   --seed 42 \

 ---
 # Model Summary
+**Aramis-2B-BitNet** *(2.41B params / Context Length: Maximum sequence length of 4096 tokens)*
 A compact, agent-oriented small language model focused on language understanding and contextual decision-making.
 Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
 Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
 **Model Variants**
+- jpacifico/Aramis-2B-BitNet-bf16 (this repo): Contains the retrainable weights in BF16 format
+- [jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF](https://huggingface.co/jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
 # First benchmarks
+**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the bitnet-b1.58-2B-4T quantized baseline (58,38).
 All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
+| Benchmark (metric)                 | microsoft/bitnet-b1.58-2B-4T-bf16 | jpacifico/Aramis-2B-BitNet-bf16|
 |------------------------------------|-----------------------------------|--------------------------------|
 | arc_challenge 0 shot               | 47.95                             | **51.62**                      |
 | arc_easy 0 shot                    | 73.44                             | **75.25**                      |
+| hellaswag 0 shot                   | 68.27                             | **68.52**                      |
 | openbookqa 0 shot                  | **41.6**                          | 41.4                           |
 | boolq 0 shot                       | **79.39**                         | 79.33                          |
 | piqa 0 shot                        | **77.86**                         | 77.53                          |
 | openbmb/MiniCPM-2B-dpo-bf16                        | 44,28                  |
 | microsoft/bitnet-b1.58-2B-4T-bf16 (base model)     | 47,95                  |
 | microsoft/bitnet-b1.58-2B-4T                       | 49,91                  |
+| jpacifico/Aramis-2B-BitNet-bf16                    | **51,62**              |
 ### Reproducibility
 ```bash
 HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
+  --model_args pretrained=jpacifico/Aramis-2B-BitNet-bf16,dtype=bfloat16 \
   --tasks arc_challenge \
   --device cuda:0 --batch_size 8 \
   --seed 42 \