Model Summary
Aramis-2B-BitNet (2.41B params / Context Length: Maximum sequence length of 4096 tokens)
A compact, agent-oriented small language model focused on contextual reasoning, language understanding and multi-turn instruction following.
Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless from the BF16 checkpoint.
- Family: BitNet b1.58 (ternary weights
{-1, 0, +1}with abs-mean scaling) - Post-training recipe: bilingual DPO (FR+EN) + ModelStock/TIES merges to combine FR-centric and EN-centric variants (agent-oriented behaviors; pragmatic reasoning).
- This repo: GGUF weights for efficient local inference with bitnet.cpp.
- Training & provenance: see the BF16 model card for full details of datasets, merges, and configuration.
Upstream references
- Technical Report: BitNet b1.58 2B4T Technical Report (Microsoft Research, 2025). Contains the official description of the GGUF variant “used for bitnet.cpp” and the lossless-inference note.
- Official GGUF base model (Microsoft): microsoft/bitnet-b1.58-2B-4T-gguf
- bitnet.cpp (official inference framework): microsoft/BitNet on GitHub
Benchmarks (from the BF16 version)
Interpretation: Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the bitnet-b1.58-2B-4T quantized baseline (58,38). All scores are reported in comparison with the original microsoft/bitnet-b1.58-2B-4T-bf16 model.
| Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | jpacifico/Aramis-2B-BitNet-bf16 |
|---|---|---|
| arc_challenge 0 shot | 47.95 | 51.62 |
| arc_easy 0 shot | 73.44 | 75.25 |
| hellaswag 0 shot | 68.27 | 68.52 |
| openbookqa 0 shot | 41.6 | 41.4 |
| boolq 0 shot | 79.39 | 79.33 |
| piqa 0 shot | 77.86 | 77.53 |
| winogrande 0 shot | 70.64 | 72.06 |
| ifeval 0 shot | 41.85 | 44.12 |
| triviaqa 0 shot | 11.95 | 15.06 |
| triviaqa 5 shot EM | 33.51 | 33.51 |
| truthfulqa_mc2 10 shot | 45.89 | 46.52 |
| gsm8k 4 shot EM | 62.4 | 59.67 |
| mmlu 5 shot acc | 52.96 | 53.39 |
| commonsense_qa 10 shot acc | 71.17 | 70.76 |
ARC-Challenge (zero-shot): 51.62 — first-ever ≥50 reported for a 2B-class model (>1.5B, <2.5B) based on publicly available results.
| Model | arc_challenge (0 shot) |
|---|---|
| Qwen/Qwen3-1.7B | 43 |
| ibm-granite/granite-3.3-2b-base | 44,54 |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | 34,9 |
| openbmb/MiniCPM-2B-dpo-bf16 | 44,28 |
| microsoft/bitnet-b1.58-2B-4T-bf16 (base model) | 47,95 |
| microsoft/bitnet-b1.58-2B-4T | 49,91 |
| jpacifico/Aramis-2B-BitNet-bf16 | 51,62 |
Reproducibility
All benchmark results reported here were obtained using LM Eval Harness.
The following example reproduces the ARC-Challenge (0-shot) evaluation for this model:
HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
--model_args pretrained=jpacifico/Aramis-2B-BitNet-bf16,dtype=bfloat16 \
--tasks arc_challenge \
--device cuda:0 --batch_size 8 \
--seed 42 \
--num_fewshot 0 \
--confirm_run_unsafe_code \
--trust_remote_code
- All results were computed with LM Eval Harness v0.4.9
- Randomness (e.g. seeds, batch sizes) may cause slight variations in results
- The same procedure was used to evaluate all tasks presented in the benchmark tables
About “lossless” (what it means here)
Microsoft’s report states that the CPU reference implementation “ensur[es] numerical accuracy (lossless inference relative to the training procedure)” when running BitNet b1.58 models via bitnet.cpp.
- In practice, this means the 1.58-bit packed weights used at train time are executed as-is by the specialized kernels; the GGUF container is simply the delivery format consumed by
bitnet.cppfor these kernels. - Microsoft’s GGUF model card also explicitly presents the GGUF variant as the format “compatible with the
bitnet.cpplibrary”.
Note: Efficiency claims (memory/latency/energy) and the “lossless” inference property apply when using
bitnet.cpp. Running the model through generic paths (e.g., vanilla Transformers) doesn’t unlock those kernel-level advantages. See Microsoft’s GGUF page andbitnet.cppREADME.
Intended Use
- Great for: agent-oriented assistants, bilingual instruction following, pragmatic reasoning, and everyday knowledge tasks — on CPUs or modest GPUs using
bitnet.cpp. - Not optimized for: formal math or code generation (see BF16 card for details and alternatives).
How to run (bitnet.cpp)
You can run this model using my demo Colab Notebook
Please refer to the bitnet.cpp GitHub repository for detailed compilation steps, usage examples, and command-line options.
Disclamer
This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.
- Developed by: Jonathan Pacifico, 2025
- Model type: LLM
- Language(s) (NLP): French, English
- License: MIT
Made with ❤️ in France
- Downloads last month
- 161
We're not able to determine the quantization variants.
Model tree for jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF
Base model
microsoft/bitnet-b1.58-2B-4T-bf16