Model Summary

Aramis-2B-BitNet (2.41B params / Context Length: Maximum sequence length of 4096 tokens)
A compact, agent-oriented small language model focused on contextual reasoning, language understanding and multi-turn instruction following. Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants. Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless from the BF16 checkpoint.

  • Family: BitNet b1.58 (ternary weights {-1, 0, +1} with abs-mean scaling)
  • Post-training recipe: bilingual DPO (FR+EN) + ModelStock/TIES merges to combine FR-centric and EN-centric variants (agent-oriented behaviors; pragmatic reasoning).
  • This repo: GGUF weights for efficient local inference with bitnet.cpp.
  • Training & provenance: see the BF16 model card for full details of datasets, merges, and configuration.

Upstream references


Benchmarks (from the BF16 version)

Interpretation: Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the bitnet-b1.58-2B-4T quantized baseline (58,38). All scores are reported in comparison with the original microsoft/bitnet-b1.58-2B-4T-bf16 model.

Benchmark (metric) microsoft/bitnet-b1.58-2B-4T-bf16 jpacifico/Aramis-2B-BitNet-bf16
arc_challenge 0 shot 47.95 51.62
arc_easy 0 shot 73.44 75.25
hellaswag 0 shot 68.27 68.52
openbookqa 0 shot 41.6 41.4
boolq 0 shot 79.39 79.33
piqa 0 shot 77.86 77.53
winogrande 0 shot 70.64 72.06
ifeval 0 shot 41.85 44.12
triviaqa 0 shot 11.95 15.06
triviaqa 5 shot EM 33.51 33.51
truthfulqa_mc2 10 shot 45.89 46.52
gsm8k 4 shot EM 62.4 59.67
mmlu 5 shot acc 52.96 53.39
commonsense_qa 10 shot acc 71.17 70.76

ARC-Challenge (zero-shot): 51.62 — first-ever ≥50 reported for a 2B-class model (>1.5B, <2.5B) based on publicly available results.

Model arc_challenge (0 shot)
Qwen/Qwen3-1.7B 43
ibm-granite/granite-3.3-2b-base 44,54
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B 34,9
openbmb/MiniCPM-2B-dpo-bf16 44,28
microsoft/bitnet-b1.58-2B-4T-bf16 (base model) 47,95
microsoft/bitnet-b1.58-2B-4T 49,91
jpacifico/Aramis-2B-BitNet-bf16 51,62

Reproducibility

All benchmark results reported here were obtained using LM Eval Harness.
The following example reproduces the ARC-Challenge (0-shot) evaluation for this model:

HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
  --model_args pretrained=jpacifico/Aramis-2B-BitNet-bf16,dtype=bfloat16 \
  --tasks arc_challenge \
  --device cuda:0 --batch_size 8 \
  --seed 42 \
  --num_fewshot 0 \
  --confirm_run_unsafe_code \
  --trust_remote_code
  • All results were computed with LM Eval Harness v0.4.9
  • Randomness (e.g. seeds, batch sizes) may cause slight variations in results
  • The same procedure was used to evaluate all tasks presented in the benchmark tables

About “lossless” (what it means here)

Microsoft’s report states that the CPU reference implementation “ensur[es] numerical accuracy (lossless inference relative to the training procedure)” when running BitNet b1.58 models via bitnet.cpp.

  • In practice, this means the 1.58-bit packed weights used at train time are executed as-is by the specialized kernels; the GGUF container is simply the delivery format consumed by bitnet.cpp for these kernels.
  • Microsoft’s GGUF model card also explicitly presents the GGUF variant as the format “compatible with the bitnet.cpp library”.

Note: Efficiency claims (memory/latency/energy) and the “lossless” inference property apply when using bitnet.cpp. Running the model through generic paths (e.g., vanilla Transformers) doesn’t unlock those kernel-level advantages. See Microsoft’s GGUF page and bitnet.cpp README.


Intended Use

  • Great for: agent-oriented assistants, bilingual instruction following, pragmatic reasoning, and everyday knowledge tasks — on CPUs or modest GPUs using bitnet.cpp.
  • Not optimized for: formal math or code generation (see BF16 card for details and alternatives).

How to run (bitnet.cpp)

You can run this model using my demo Colab Notebook

Please refer to the bitnet.cpp GitHub repository for detailed compilation steps, usage examples, and command-line options.

Disclamer
This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.

  • Developed by: Jonathan Pacifico, 2025
  • Model type: LLM
  • Language(s) (NLP): French, English
  • License: MIT

Made with ❤️ in France

Downloads last month
161
GGUF
Model size
2B params
Architecture
bitnet-25
Hardware compatibility
Log In to view the estimation

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF

Quantized
(4)
this model

Collection including jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF