Model Summary

Aramis-2B-BitNet (2.41B params / Context Length: Maximum sequence length of 4096 tokens)
A compact, agent-oriented small language model focused on contextual reasoning, language understanding and multi-turn instruction following. Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants. Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless from the BF16 checkpoint.

Family: BitNet b1.58 (ternary weights {-1, 0, +1} with abs-mean scaling)
Post-training recipe: bilingual DPO (FR+EN) + ModelStock/TIES merges to combine FR-centric and EN-centric variants (agent-oriented behaviors; pragmatic reasoning).
This repo: GGUF weights for efficient local inference with bitnet.cpp.
Training & provenance: see the BF16 model card for full details of datasets, merges, and configuration.

Upstream references

Technical Report: BitNet b1.58 2B4T Technical Report (Microsoft Research, 2025). Contains the official description of the GGUF variant “used for bitnet.cpp” and the lossless-inference note.
Official GGUF base model (Microsoft): microsoft/bitnet-b1.58-2B-4T-gguf
bitnet.cpp (official inference framework): microsoft/BitNet on GitHub

Benchmarks (from the BF16 version)

Interpretation: Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the bitnet-b1.58-2B-4T quantized baseline (58,38). All scores are reported in comparison with the original microsoft/bitnet-b1.58-2B-4T-bf16 model.

Benchmark (metric)	microsoft/bitnet-b1.58-2B-4T-bf16	jpacifico/Aramis-2B-BitNet-bf16
arc_challenge 0 shot	47.95	51.62
arc_easy 0 shot	73.44	75.25
hellaswag 0 shot	68.27	68.52
openbookqa 0 shot	41.6	41.4
boolq 0 shot	79.39	79.33
piqa 0 shot	77.86	77.53
winogrande 0 shot	70.64	72.06
ifeval 0 shot	41.85	44.12
triviaqa 0 shot	11.95	15.06
triviaqa 5 shot EM	33.51	33.51
truthfulqa_mc2 10 shot	45.89	46.52
gsm8k 4 shot EM	62.4	59.67
mmlu 5 shot acc	52.96	53.39
commonsense_qa 10 shot acc	71.17	70.76

ARC-Challenge (zero-shot): 51.62 — first-ever ≥50 reported for a 2B-class model (>1.5B, <2.5B) based on publicly available results.

Model	arc_challenge (0 shot)
Qwen/Qwen3-1.7B	43
ibm-granite/granite-3.3-2b-base	44,54
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	34,9
openbmb/MiniCPM-2B-dpo-bf16	44,28
microsoft/bitnet-b1.58-2B-4T-bf16 (base model)	47,95
microsoft/bitnet-b1.58-2B-4T	49,91
jpacifico/Aramis-2B-BitNet-bf16	51,62

Reproducibility

All benchmark results reported here were obtained using LM Eval Harness.
The following example reproduces the ARC-Challenge (0-shot) evaluation for this model:

HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
  --model_args pretrained=jpacifico/Aramis-2B-BitNet-bf16,dtype=bfloat16 \
  --tasks arc_challenge \
  --device cuda:0 --batch_size 8 \
  --seed 42 \
  --num_fewshot 0 \
  --confirm_run_unsafe_code \
  --trust_remote_code

All results were computed with LM Eval Harness v0.4.9
Randomness (e.g. seeds, batch sizes) may cause slight variations in results
The same procedure was used to evaluate all tasks presented in the benchmark tables

About “lossless” (what it means here)

Microsoft’s report states that the CPU reference implementation “ensur[es] numerical accuracy (lossless inference relative to the training procedure)” when running BitNet b1.58 models via bitnet.cpp.

In practice, this means the 1.58-bit packed weights used at train time are executed as-is by the specialized kernels; the GGUF container is simply the delivery format consumed by bitnet.cpp for these kernels.
Microsoft’s GGUF model card also explicitly presents the GGUF variant as the format “compatible with the bitnet.cpp library”.

Note: Efficiency claims (memory/latency/energy) and the “lossless” inference property apply when using bitnet.cpp. Running the model through generic paths (e.g., vanilla Transformers) doesn’t unlock those kernel-level advantages. See Microsoft’s GGUF page and bitnet.cpp README.

Intended Use

Great for: agent-oriented assistants, bilingual instruction following, pragmatic reasoning, and everyday knowledge tasks — on CPUs or modest GPUs using bitnet.cpp.
Not optimized for: formal math or code generation (see BF16 card for details and alternatives).

How to run (bitnet.cpp)

You can run this model using my demo Colab Notebook

Please refer to the bitnet.cpp GitHub repository for detailed compilation steps, usage examples, and command-line options.

Disclamer
This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.