RichardErkhov commited on
Commit
4648b78
·
verified ·
1 Parent(s): 51e4a57

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +311 -0
README.md ADDED
@@ -0,0 +1,311 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ MagpieLM-8B-Chat-v0.1 - AWQ
11
+ - Model creator: https://huggingface.co/Magpie-Align/
12
+ - Original model: https://huggingface.co/Magpie-Align/MagpieLM-8B-Chat-v0.1/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ library_name: transformers
20
+ license: llama3.1
21
+ base_model: Magpie-Align/MagpieLM-8B-SFT-v0.1
22
+ tags:
23
+ - alignment-handbook
24
+ - trl
25
+ - dpo
26
+ - generated_from_trainer
27
+ datasets:
28
+ - Magpie-Align/MagpieLM-SFT-Data-v0.1
29
+ - Magpie-Align/MagpieLM-DPO-Data-v0.1
30
+ model-index:
31
+ - name: MagpieLM-8B-Chat-v0.1
32
+ results: []
33
+ ---
34
+
35
+ ![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png)
36
+
37
+ # 🐦 MagpieLM-8B-Chat-v0.1
38
+
39
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://api.wandb.ai/links/uw-nsl/0s1eegy2)
40
+
41
+ ## 🧐 About This Model
42
+
43
+ *Model full name: Llama3.1-MagpieLM-8B-Chat-v0.1*
44
+
45
+ This model is an aligned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct, Qwen-2-7B-Instruct, and Gemma-2-9B-it.
46
+
47
+ We apply the following standard alignment pipeline with two carefully crafted synthetic datasets.
48
+
49
+ We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-SFT-Data-v0.1).
50
+ * **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-8B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1)
51
+
52
+ We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.
53
+
54
+ ## 🔥 Benchmark Performance
55
+
56
+ Greedy Decoding
57
+
58
+ - **Alpaca Eval 2: 58.18 (LC), 62.38 (WR)**
59
+ - **Arena Hard: 48.4**
60
+ - **WildBench WB Score (v2.0625): 44.72**
61
+
62
+ **Benchmark Performance Compare to Other SOTA SLMs**
63
+
64
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/q1Rasy66h6lmaUP1KQ407.jpeg)
65
+
66
+ ## 👀 Other Information
67
+
68
+ **License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE).
69
+
70
+ **Conversation Template**: Please use the Llama 3 chat template for the best performance.
71
+
72
+ **Limitations**: This model primarily understands and generates content in English. Its outputs may contain factual errors, logical inconsistencies, or reflect biases present in the training data. While the model aims to improve instruction-following and helpfulness, it isn't specifically designed for complex reasoning tasks, potentially leading to suboptimal performance in these areas. Additionally, the model may produce unsafe or inappropriate content, as no specific safety training were implemented during the alignment process.
73
+
74
+ ## 🧐 How to use it?
75
+
76
+ [![Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co/spaces/flydust/MagpieLM-8B)
77
+
78
+ Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.
79
+
80
+ You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
81
+
82
+ ```python
83
+ import transformers
84
+ import torch
85
+
86
+ model_id = "MagpieLM-8B-Chat-v0.1"
87
+
88
+ pipeline = transformers.pipeline(
89
+ "text-generation",
90
+ model=model_id,
91
+ model_kwargs={"torch_dtype": torch.bfloat16},
92
+ device_map="auto",
93
+ )
94
+
95
+ messages = [
96
+ {"role": "system", "content": "You are Magpie, a friendly AI assistant."},
97
+ {"role": "user", "content": "Who are you?"},
98
+ ]
99
+
100
+ outputs = pipeline(
101
+ messages,
102
+ max_new_tokens=256,
103
+ )
104
+ print(outputs[0]["generated_text"][-1])
105
+ ```
106
+
107
+ ---
108
+ # Alignment Pipeline
109
+
110
+ The detailed alignment pipeline is as follows.
111
+
112
+ ## Stage 1: Supervised Fine-tuning
113
+
114
+ We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT. Please refer to the model card of [SFT checkpoint](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1) and below for detailed configurations.
115
+
116
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
117
+ <details><summary>See axolotl config</summary>
118
+
119
+ axolotl version: `0.4.1`
120
+ ```yaml
121
+ base_model: meta-llama/Meta-Llama-3.1-8B
122
+ model_type: LlamaForCausalLM
123
+ tokenizer_type: AutoTokenizer
124
+ chat_template: llama3
125
+
126
+ load_in_8bit: false
127
+ load_in_4bit: false
128
+ strict: false
129
+ main_process_port: 0
130
+
131
+ datasets:
132
+ - path: Magpie-Align/MagpieLM-SFT-Data-v0.1
133
+ type: sharegpt
134
+ conversation: llama3
135
+
136
+ dataset_prepared_path: last_run_prepared
137
+ val_set_size: 0.001
138
+ output_dir: axolotl_out/MagpieLM-8B-SFT-v0.1
139
+
140
+ sequence_len: 8192
141
+ sample_packing: true
142
+ eval_sample_packing: false
143
+ pad_to_sequence_len: true
144
+
145
+ wandb_project: SynDa
146
+ wandb_entity:
147
+ wandb_watch:
148
+ wandb_name: MagpieLM-8B-SFT-v0.1
149
+ wandb_log_model:
150
+ hub_model_id: Magpie-Align/MagpieLM-8B-SFT-v0.1
151
+
152
+ gradient_accumulation_steps: 32
153
+ micro_batch_size: 1
154
+ num_epochs: 2
155
+ optimizer: paged_adamw_8bit
156
+ lr_scheduler: cosine
157
+ learning_rate: 2e-5
158
+
159
+ train_on_inputs: false
160
+ group_by_length: false
161
+ bf16: auto
162
+ fp16:
163
+ tf32: false
164
+
165
+ gradient_checkpointing: true
166
+ gradient_checkpointing_kwargs:
167
+ use_reentrant: false
168
+ early_stopping_patience:
169
+ resume_from_checkpoint:
170
+ logging_steps: 1
171
+ xformers_attention:
172
+ flash_attention: true
173
+
174
+ warmup_ratio: 0.1
175
+ evals_per_epoch: 5
176
+ eval_table_size:
177
+ saves_per_epoch:
178
+ debug:
179
+ deepspeed:
180
+ weight_decay: 0.0
181
+ fsdp:
182
+ fsdp_config:
183
+ special_tokens:
184
+ pad_token: <|end_of_text|>
185
+ ```
186
+ </details><br>
187
+
188
+ ## Stage 2: Direct Preference Optimization
189
+
190
+ ### Training hyperparameters
191
+
192
+ The following hyperparameters were used during training:
193
+ - learning_rate: 2e-07
194
+ - train_batch_size: 2
195
+ - eval_batch_size: 4
196
+ - seed: 42
197
+ - distributed_type: multi-GPU
198
+ - num_devices: 4
199
+ - gradient_accumulation_steps: 16
200
+ - total_train_batch_size: 128
201
+ - total_eval_batch_size: 16
202
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
203
+ - lr_scheduler_type: cosine
204
+ - lr_scheduler_warmup_ratio: 0.1
205
+ - num_epochs: 1
206
+
207
+ ### Training results
208
+
209
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
210
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
211
+ | 0.686 | 0.0653 | 100 | 0.6856 | -0.0491 | -0.0616 | 0.6480 | 0.0125 | -471.3315 | -478.8181 | -0.7034 | -0.7427 |
212
+ | 0.6218 | 0.1306 | 200 | 0.6277 | -0.6128 | -0.7720 | 0.6960 | 0.1591 | -542.3653 | -535.1920 | -0.7771 | -0.8125 |
213
+ | 0.5705 | 0.1959 | 300 | 0.5545 | -2.4738 | -3.0052 | 0.7270 | 0.5314 | -765.6894 | -721.2881 | -0.7894 | -0.8230 |
214
+ | 0.4606 | 0.2612 | 400 | 0.5081 | -2.6780 | -3.3782 | 0.7560 | 0.7002 | -802.9893 | -741.7116 | -0.6813 | -0.7247 |
215
+ | 0.4314 | 0.3266 | 500 | 0.4787 | -3.6697 | -4.6026 | 0.7630 | 0.9329 | -925.4283 | -840.8740 | -0.6189 | -0.6691 |
216
+ | 0.449 | 0.3919 | 600 | 0.4533 | -3.7414 | -4.8019 | 0.7820 | 1.0604 | -945.3563 | -848.0514 | -0.6157 | -0.6681 |
217
+ | 0.4538 | 0.4572 | 700 | 0.4350 | -4.3858 | -5.6549 | 0.7890 | 1.2690 | -1030.6561 | -912.4920 | -0.5789 | -0.6331 |
218
+ | 0.35 | 0.5225 | 800 | 0.4186 | -4.7129 | -6.1662 | 0.8010 | 1.4533 | -1081.7843 | -945.1964 | -0.5778 | -0.6347 |
219
+ | 0.4153 | 0.5878 | 900 | 0.4108 | -4.9836 | -6.5320 | 0.7970 | 1.5484 | -1118.3677 | -972.2631 | -0.5895 | -0.6474 |
220
+ | 0.3935 | 0.6531 | 1000 | 0.3999 | -4.4303 | -5.9370 | 0.8110 | 1.5067 | -1058.8646 | -916.9379 | -0.6016 | -0.6598 |
221
+ | 0.3205 | 0.7184 | 1100 | 0.3950 | -5.1884 | -6.8827 | 0.8010 | 1.6943 | -1153.4371 | -992.7452 | -0.5846 | -0.6452 |
222
+ | 0.3612 | 0.7837 | 1200 | 0.3901 | -5.0426 | -6.7179 | 0.8040 | 1.6753 | -1136.9619 | -978.1701 | -0.6046 | -0.6637 |
223
+ | 0.3058 | 0.8490 | 1300 | 0.3877 | -5.1224 | -6.8428 | 0.8040 | 1.7204 | -1149.4465 | -986.1475 | -0.6087 | -0.6690 |
224
+ | 0.3467 | 0.9144 | 1400 | 0.3871 | -5.2335 | -6.9809 | 0.8090 | 1.7474 | -1163.2629 | -997.2610 | -0.6071 | -0.6672 |
225
+ | 0.3197 | 0.9797 | 1500 | 0.3867 | -5.1502 | -6.8793 | 0.8080 | 1.7291 | -1153.0979 | -988.9237 | -0.6120 | -0.6722 |
226
+
227
+
228
+ ### Framework versions
229
+
230
+ - Transformers 4.44.2
231
+ - Pytorch 2.4.1+cu121
232
+ - Datasets 3.0.0
233
+ - Tokenizers 0.19.1
234
+
235
+ <details><summary>See alignment handbook configs</summary>
236
+
237
+ ```yaml
238
+ # Customized Configs
239
+ model_name_or_path: Magpie-Align/MagpieLM-8B-SFT-v0.1
240
+ hub_model_id: Magpie-Align/MagpieLM-8B-Chat-v0.1
241
+ output_dir: alignment_handbook_out/MagpieLM-8B-Chat-v0.1
242
+ run_name: MagpieLM-8B-Chat-v0.1
243
+
244
+ dataset_mixer:
245
+ Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
246
+ dataset_splits:
247
+ - train
248
+ - test
249
+ preprocessing_num_workers: 24
250
+
251
+ # DPOTrainer arguments
252
+ bf16: true
253
+ beta: 0.01
254
+ learning_rate: 2.0e-7
255
+ gradient_accumulation_steps: 16
256
+ per_device_train_batch_size: 2
257
+ per_device_eval_batch_size: 4
258
+ num_train_epochs: 1
259
+ max_length: 2048
260
+ max_prompt_length: 1800
261
+ warmup_ratio: 0.1
262
+ logging_steps: 1
263
+ lr_scheduler_type: cosine
264
+ optim: adamw_torch
265
+
266
+ torch_dtype: null
267
+ # use_flash_attention_2: true
268
+ do_eval: true
269
+ evaluation_strategy: steps
270
+ eval_steps: 100
271
+ gradient_checkpointing: true
272
+ gradient_checkpointing_kwargs:
273
+ use_reentrant: False
274
+ log_level: info
275
+ push_to_hub: true
276
+ save_total_limit: 0
277
+ seed: 42
278
+ report_to:
279
+ - wandb
280
+ ```
281
+ </details><be>
282
+
283
+ ## 📚 Citation
284
+
285
+ If you find the model, data, or code useful, please cite:
286
+ ```
287
+ @article{xu2024magpie,
288
+ title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
289
+ author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
290
+ year={2024},
291
+ eprint={2406.08464},
292
+ archivePrefix={arXiv},
293
+ primaryClass={cs.CL}
294
+ }
295
+
296
+ @article{xu2024stronger,
297
+ title={Stronger Models are NOT Stronger Teachers for Instruction Tuning},
298
+ author={Xu, Zhangchen and Jiang, Fengqing and Niu, Luyao and Lin, Bill Yuchen and Poovendran, Radha},
299
+ journal={arXiv preprint arXiv:2411.07133},
300
+ year={2024}
301
+ }
302
+ ```
303
+
304
+ **Contact**
305
+
306
+ Questions? Contact:
307
+ - [Zhangchen Xu](https://zhangchenxu.com/) [zxu9 at uw dot edu], and
308
+ - [Bill Yuchen Lin](https://yuchenlin.xyz/) [yuchenlin1995 at gmail dot com]
309
+
310
+
311
+