jpacifico commited on
Commit
279bb9e
·
verified ·
1 Parent(s): 2fbeb74

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -18,7 +18,7 @@ language:
18
  ---
19
  # Model Summary
20
 
21
- **bitnet-dpo-merged-modelstock7** *(2.41B params / Context Length: Maximum sequence length of 4096 tokens)*
22
  A compact, agent-oriented small language model focused on language understanding and contextual decision-making.
23
  Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
24
  Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
@@ -29,8 +29,8 @@ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lo
29
 
30
  **Model Variants**
31
 
32
- - jpacifico/bitnet-dpo-merged-modelstock7 (this repo): Contains the retrainable weights in BF16 format
33
- - [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
34
 
35
 
36
 
@@ -49,14 +49,14 @@ Iterative DPO + Model merging :
49
 
50
  # First benchmarks
51
 
52
- **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
53
  All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
54
 
55
- | Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
56
  |------------------------------------|-----------------------------------|--------------------------------|
57
  | arc_challenge 0 shot | 47.95 | **51.62** |
58
  | arc_easy 0 shot | 73.44 | **75.25** |
59
- | hellaswag 0 shot | 68.27 | **68.52** |
60
  | openbookqa 0 shot | **41.6** | 41.4 |
61
  | boolq 0 shot | **79.39** | 79.33 |
62
  | piqa 0 shot | **77.86** | 77.53 |
@@ -79,7 +79,7 @@ All scores are reported in comparison with the original [microsoft/bitnet-b1.58-
79
  | openbmb/MiniCPM-2B-dpo-bf16 | 44,28 |
80
  | microsoft/bitnet-b1.58-2B-4T-bf16 (base model) | 47,95 |
81
  | microsoft/bitnet-b1.58-2B-4T | 49,91 |
82
- | jpacifico/bitnet-dpo-merged-modelstock7 | **51,62** |
83
 
84
 
85
  ### Reproducibility
@@ -89,7 +89,7 @@ The following example reproduces the **ARC-Challenge (0-shot)** evaluation for t
89
 
90
  ```bash
91
  HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
92
- --model_args pretrained=jpacifico/modelstock7,dtype=bfloat16 \
93
  --tasks arc_challenge \
94
  --device cuda:0 --batch_size 8 \
95
  --seed 42 \
 
18
  ---
19
  # Model Summary
20
 
21
+ **Aramis-2B-BitNet** *(2.41B params / Context Length: Maximum sequence length of 4096 tokens)*
22
  A compact, agent-oriented small language model focused on language understanding and contextual decision-making.
23
  Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
24
  Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
 
29
 
30
  **Model Variants**
31
 
32
+ - jpacifico/Aramis-2B-BitNet-bf16 (this repo): Contains the retrainable weights in BF16 format
33
+ - [jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF](https://huggingface.co/jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
34
 
35
 
36
 
 
49
 
50
  # First benchmarks
51
 
52
+ **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the bitnet-b1.58-2B-4T quantized baseline (58,38).
53
  All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
54
 
55
+ | Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | jpacifico/Aramis-2B-BitNet-bf16|
56
  |------------------------------------|-----------------------------------|--------------------------------|
57
  | arc_challenge 0 shot | 47.95 | **51.62** |
58
  | arc_easy 0 shot | 73.44 | **75.25** |
59
+ | hellaswag 0 shot | 68.27 | **68.52** |
60
  | openbookqa 0 shot | **41.6** | 41.4 |
61
  | boolq 0 shot | **79.39** | 79.33 |
62
  | piqa 0 shot | **77.86** | 77.53 |
 
79
  | openbmb/MiniCPM-2B-dpo-bf16 | 44,28 |
80
  | microsoft/bitnet-b1.58-2B-4T-bf16 (base model) | 47,95 |
81
  | microsoft/bitnet-b1.58-2B-4T | 49,91 |
82
+ | jpacifico/Aramis-2B-BitNet-bf16 | **51,62** |
83
 
84
 
85
  ### Reproducibility
 
89
 
90
  ```bash
91
  HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
92
+ --model_args pretrained=jpacifico/Aramis-2B-BitNet-bf16,dtype=bfloat16 \
93
  --tasks arc_challenge \
94
  --device cuda:0 --batch_size 8 \
95
  --seed 42 \