jpacifico commited on
Commit
4495094
·
verified ·
1 Parent(s): 135a742

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -79,15 +79,34 @@ Evaluations were performed using [LM Eval Harness](https://github.com/EleutherAI
79
  | jpacifico/bitnet-dpo-merged-modelstock7 | **51,62** |
80
 
81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
- ## Usage
84
 
85
- You can run this model using my [Colab notebook](https://github.com/jpacifico/Chocolatine-LLM/blob/main/Chocolatine_14B_inference_test_colab.ipynb)
86
 
87
- You can also run this model using the following code:
88
 
89
 
90
- ## Last checkpoint
91
  ### Merge Method
92
 
93
  This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [jpacifico/bitnet-dpo-merged-modelstock-retrain](https://huggingface.co/jpacifico/bitnet-dpo-merged-modelstock-retrain) as a base.
@@ -119,7 +138,7 @@ tokenizer_source: jpacifico/bitnet-dpo-merged-modelstock-retrain
119
 
120
  ```
121
 
122
- ## Limitations
123
 
124
  Not tuned for coding or formal math; prefer specialized variants if those are critical.
125
  No explicit chain-of-thought training; improvements come from bilingual DPO + merging.
 
79
  | jpacifico/bitnet-dpo-merged-modelstock7 | **51,62** |
80
 
81
 
82
+ ### Reproducibility
83
+
84
+ All benchmark results reported here were obtained using [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness).
85
+ The following example reproduces the **ARC-Challenge (0-shot)** evaluation for this model:
86
+
87
+ ```bash
88
+ HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
89
+ --model_args pretrained=jpacifico/modelstock7,dtype=bfloat16 \
90
+ --tasks arc_challenge \
91
+ --device cuda:0 --batch_size 8 \
92
+ --seed 42 \
93
+ --num_fewshot 0 \
94
+ --confirm_run_unsafe_code \
95
+ --trust_remote_code
96
+ ```
97
+
98
+ - All results were computed with LM Eval Harness v0.4.9
99
+ - Randomness (e.g. seeds, batch sizes) may cause slight variations in results
100
+ - The same procedure was used to evaluate all tasks presented in the benchmark tables
101
 
102
+ # Usage with `bitnet.cpp`
103
 
104
+ You can run this model using my demo [Colab notebook](https://github.com/jpacifico/) TBD
105
 
106
+ Please refer to the [bitnet.cpp](https://github.com/microsoft/BitNet) GitHub repository for detailed compilation steps, usage examples, and command-line options.
107
 
108
 
109
+ # Last checkpoint
110
  ### Merge Method
111
 
112
  This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [jpacifico/bitnet-dpo-merged-modelstock-retrain](https://huggingface.co/jpacifico/bitnet-dpo-merged-modelstock-retrain) as a base.
 
138
 
139
  ```
140
 
141
+ # Limitations
142
 
143
  Not tuned for coding or formal math; prefer specialized variants if those are critical.
144
  No explicit chain-of-thought training; improvements come from bilingual DPO + merging.