Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ language:
|
|
| 18 |
---
|
| 19 |
# Model Summary
|
| 20 |
|
| 21 |
-
**
|
| 22 |
A compact, agent-oriented small language model focused on language understanding and contextual decision-making.
|
| 23 |
Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
|
| 24 |
Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
|
|
@@ -29,8 +29,8 @@ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lo
|
|
| 29 |
|
| 30 |
**Model Variants**
|
| 31 |
|
| 32 |
-
- jpacifico/
|
| 33 |
-
- [jpacifico/
|
| 34 |
|
| 35 |
|
| 36 |
|
|
@@ -49,14 +49,14 @@ Iterative DPO + Model merging :
|
|
| 49 |
|
| 50 |
# First benchmarks
|
| 51 |
|
| 52 |
-
**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the
|
| 53 |
All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
|
| 54 |
|
| 55 |
-
| Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 |
|
| 56 |
|------------------------------------|-----------------------------------|--------------------------------|
|
| 57 |
| arc_challenge 0 shot | 47.95 | **51.62** |
|
| 58 |
| arc_easy 0 shot | 73.44 | **75.25** |
|
| 59 |
-
| hellaswag 0 shot
|
| 60 |
| openbookqa 0 shot | **41.6** | 41.4 |
|
| 61 |
| boolq 0 shot | **79.39** | 79.33 |
|
| 62 |
| piqa 0 shot | **77.86** | 77.53 |
|
|
@@ -79,7 +79,7 @@ All scores are reported in comparison with the original [microsoft/bitnet-b1.58-
|
|
| 79 |
| openbmb/MiniCPM-2B-dpo-bf16 | 44,28 |
|
| 80 |
| microsoft/bitnet-b1.58-2B-4T-bf16 (base model) | 47,95 |
|
| 81 |
| microsoft/bitnet-b1.58-2B-4T | 49,91 |
|
| 82 |
-
| jpacifico/
|
| 83 |
|
| 84 |
|
| 85 |
### Reproducibility
|
|
@@ -89,7 +89,7 @@ The following example reproduces the **ARC-Challenge (0-shot)** evaluation for t
|
|
| 89 |
|
| 90 |
```bash
|
| 91 |
HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
|
| 92 |
-
--model_args pretrained=jpacifico/
|
| 93 |
--tasks arc_challenge \
|
| 94 |
--device cuda:0 --batch_size 8 \
|
| 95 |
--seed 42 \
|
|
|
|
| 18 |
---
|
| 19 |
# Model Summary
|
| 20 |
|
| 21 |
+
**Aramis-2B-BitNet** *(2.41B params / Context Length: Maximum sequence length of 4096 tokens)*
|
| 22 |
A compact, agent-oriented small language model focused on language understanding and contextual decision-making.
|
| 23 |
Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
|
| 24 |
Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
|
|
|
|
| 29 |
|
| 30 |
**Model Variants**
|
| 31 |
|
| 32 |
+
- jpacifico/Aramis-2B-BitNet-bf16 (this repo): Contains the retrainable weights in BF16 format
|
| 33 |
+
- [jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF](https://huggingface.co/jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
|
| 34 |
|
| 35 |
|
| 36 |
|
|
|
|
| 49 |
|
| 50 |
# First benchmarks
|
| 51 |
|
| 52 |
+
**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the bitnet-b1.58-2B-4T quantized baseline (58,38).
|
| 53 |
All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
|
| 54 |
|
| 55 |
+
| Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | jpacifico/Aramis-2B-BitNet-bf16|
|
| 56 |
|------------------------------------|-----------------------------------|--------------------------------|
|
| 57 |
| arc_challenge 0 shot | 47.95 | **51.62** |
|
| 58 |
| arc_easy 0 shot | 73.44 | **75.25** |
|
| 59 |
+
| hellaswag 0 shot | 68.27 | **68.52** |
|
| 60 |
| openbookqa 0 shot | **41.6** | 41.4 |
|
| 61 |
| boolq 0 shot | **79.39** | 79.33 |
|
| 62 |
| piqa 0 shot | **77.86** | 77.53 |
|
|
|
|
| 79 |
| openbmb/MiniCPM-2B-dpo-bf16 | 44,28 |
|
| 80 |
| microsoft/bitnet-b1.58-2B-4T-bf16 (base model) | 47,95 |
|
| 81 |
| microsoft/bitnet-b1.58-2B-4T | 49,91 |
|
| 82 |
+
| jpacifico/Aramis-2B-BitNet-bf16 | **51,62** |
|
| 83 |
|
| 84 |
|
| 85 |
### Reproducibility
|
|
|
|
| 89 |
|
| 90 |
```bash
|
| 91 |
HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
|
| 92 |
+
--model_args pretrained=jpacifico/Aramis-2B-BitNet-bf16,dtype=bfloat16 \
|
| 93 |
--tasks arc_challenge \
|
| 94 |
--device cuda:0 --batch_size 8 \
|
| 95 |
--seed 42 \
|