RichardErkhov
/

Magpie-Align_-_MagpieLM-8B-Chat-v0.1-awq

+Quantization made by Richard Erkhov.
+[Github](https://github.com/RichardErkhov)
+[Discord](https://discord.gg/pvy7H8DZMG)
+[Request more models](https://github.com/RichardErkhov/quant_request)
+MagpieLM-8B-Chat-v0.1 - AWQ
+- Model creator: https://huggingface.co/Magpie-Align/
+- Original model: https://huggingface.co/Magpie-Align/MagpieLM-8B-Chat-v0.1/
+Original model description:
+---
+library_name: transformers
+license: llama3.1
+base_model: Magpie-Align/MagpieLM-8B-SFT-v0.1
+tags:
+- alignment-handbook
+- trl
+- dpo
+- generated_from_trainer
+datasets:
+- Magpie-Align/MagpieLM-SFT-Data-v0.1
+- Magpie-Align/MagpieLM-DPO-Data-v0.1
+model-index:
+- name: MagpieLM-8B-Chat-v0.1
+  results: []
+---
+![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png)
+# 🐦 MagpieLM-8B-Chat-v0.1
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://api.wandb.ai/links/uw-nsl/0s1eegy2)
+## 🧐 About This Model
+*Model full name: Llama3.1-MagpieLM-8B-Chat-v0.1*
+This model is an aligned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct, Qwen-2-7B-Instruct, and Gemma-2-9B-it.
+We apply the following standard alignment pipeline with two carefully crafted synthetic datasets.
+We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-SFT-Data-v0.1).
+* **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-8B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1)
+We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.
+## 🔥 Benchmark Performance
+Greedy Decoding
+- **Alpaca Eval 2: 58.18 (LC), 62.38 (WR)**
+- **Arena Hard: 48.4**
+- **WildBench WB Score (v2.0625): 44.72**
+**Benchmark Performance Compare to Other SOTA SLMs**
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/q1Rasy66h6lmaUP1KQ407.jpeg)
+## 👀 Other Information
+**License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE).
+**Conversation Template**: Please use the Llama 3 chat template for the best performance.
+**Limitations**: This model primarily understands and generates content in English. Its outputs may contain factual errors, logical inconsistencies, or reflect biases present in the training data. While the model aims to improve instruction-following and helpfulness, it isn't specifically designed for complex reasoning tasks, potentially leading to suboptimal performance in these areas. Additionally, the model may produce unsafe or inappropriate content, as no specific safety training were implemented during the alignment process.
+## 🧐 How to use it?
+[![Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co/spaces/flydust/MagpieLM-8B)
+Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.
+You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
+```python
+import transformers
+import torch
+model_id = "MagpieLM-8B-Chat-v0.1"
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16},
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "You are Magpie, a friendly AI assistant."},
+    {"role": "user", "content": "Who are you?"},
+]
+outputs = pipeline(
+    messages,
+    max_new_tokens=256,
+)
+print(outputs[0]["generated_text"][-1])
+```
+---
+# Alignment Pipeline
+The detailed alignment pipeline is as follows.
+## Stage 1: Supervised Fine-tuning
+We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT. Please refer to the model card of [SFT checkpoint](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1) and below for detailed configurations.
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.4.1`
+```yaml
+base_model: meta-llama/Meta-Llama-3.1-8B
+model_type: LlamaForCausalLM
+tokenizer_type: AutoTokenizer
+chat_template: llama3
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+main_process_port: 0
+datasets:
+  - path: Magpie-Align/MagpieLM-SFT-Data-v0.1
+    type: sharegpt
+    conversation: llama3
+dataset_prepared_path: last_run_prepared
+val_set_size: 0.001
+output_dir: axolotl_out/MagpieLM-8B-SFT-v0.1
+sequence_len: 8192
+sample_packing: true
+eval_sample_packing: false
+pad_to_sequence_len: true
+wandb_project: SynDa
+wandb_entity:
+wandb_watch:
+wandb_name: MagpieLM-8B-SFT-v0.1
+wandb_log_model:
+hub_model_id: Magpie-Align/MagpieLM-8B-SFT-v0.1
+gradient_accumulation_steps: 32
+micro_batch_size: 1
+num_epochs: 2
+optimizer: paged_adamw_8bit
+lr_scheduler: cosine
+learning_rate: 2e-5
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_ratio: 0.1
+evals_per_epoch: 5
+eval_table_size:
+saves_per_epoch:
+debug:
+deepspeed:
+weight_decay: 0.0
+fsdp:
+fsdp_config:
+special_tokens:
+  pad_token: <|end_of_text|>
+```
+</details><br>
+## Stage 2: Direct Preference Optimization
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-07
+- train_batch_size: 2
+- eval_batch_size: 4
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 128
+- total_eval_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.686         | 0.0653 | 100  | 0.6856          | -0.0491        | -0.0616          | 0.6480             | 0.0125          | -471.3315      | -478.8181    | -0.7034         | -0.7427       |
+| 0.6218        | 0.1306 | 200  | 0.6277          | -0.6128        | -0.7720          | 0.6960             | 0.1591          | -542.3653      | -535.1920    | -0.7771         | -0.8125       |
+| 0.5705        | 0.1959 | 300  | 0.5545          | -2.4738        | -3.0052          | 0.7270             | 0.5314          | -765.6894      | -721.2881    | -0.7894         | -0.8230       |
+| 0.4606        | 0.2612 | 400  | 0.5081          | -2.6780        | -3.3782          | 0.7560             | 0.7002          | -802.9893      | -741.7116    | -0.6813         | -0.7247       |
+| 0.4314        | 0.3266 | 500  | 0.4787          | -3.6697        | -4.6026          | 0.7630             | 0.9329          | -925.4283      | -840.8740    | -0.6189         | -0.6691       |
+| 0.449         | 0.3919 | 600  | 0.4533          | -3.7414        | -4.8019          | 0.7820             | 1.0604          | -945.3563      | -848.0514    | -0.6157         | -0.6681       |
+| 0.4538        | 0.4572 | 700  | 0.4350          | -4.3858        | -5.6549          | 0.7890             | 1.2690          | -1030.6561     | -912.4920    | -0.5789         | -0.6331       |
+| 0.35          | 0.5225 | 800  | 0.4186          | -4.7129        | -6.1662          | 0.8010             | 1.4533          | -1081.7843     | -945.1964    | -0.5778         | -0.6347       |
+| 0.4153        | 0.5878 | 900  | 0.4108          | -4.9836        | -6.5320          | 0.7970             | 1.5484          | -1118.3677     | -972.2631    | -0.5895         | -0.6474       |
+| 0.3935        | 0.6531 | 1000 | 0.3999          | -4.4303        | -5.9370          | 0.8110             | 1.5067          | -1058.8646     | -916.9379    | -0.6016         | -0.6598       |
+| 0.3205        | 0.7184 | 1100 | 0.3950          | -5.1884        | -6.8827          | 0.8010             | 1.6943          | -1153.4371     | -992.7452    | -0.5846         | -0.6452       |
+| 0.3612        | 0.7837 | 1200 | 0.3901          | -5.0426        | -6.7179          | 0.8040             | 1.6753          | -1136.9619     | -978.1701    | -0.6046         | -0.6637       |
+| 0.3058        | 0.8490 | 1300 | 0.3877          | -5.1224        | -6.8428          | 0.8040             | 1.7204          | -1149.4465     | -986.1475    | -0.6087         | -0.6690       |
+| 0.3467        | 0.9144 | 1400 | 0.3871          | -5.2335        | -6.9809          | 0.8090             | 1.7474          | -1163.2629     | -997.2610    | -0.6071         | -0.6672       |
+| 0.3197        | 0.9797 | 1500 | 0.3867          | -5.1502        | -6.8793          | 0.8080             | 1.7291          | -1153.0979     | -988.9237    | -0.6120         | -0.6722       |
+### Framework versions
+- Transformers 4.44.2
+- Pytorch 2.4.1+cu121
+- Datasets 3.0.0
+- Tokenizers 0.19.1
+<details><summary>See alignment handbook configs</summary>
+```yaml
+# Customized Configs
+model_name_or_path: Magpie-Align/MagpieLM-8B-SFT-v0.1
+hub_model_id: Magpie-Align/MagpieLM-8B-Chat-v0.1
+output_dir: alignment_handbook_out/MagpieLM-8B-Chat-v0.1
+run_name: MagpieLM-8B-Chat-v0.1
+dataset_mixer:
+   Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
+dataset_splits:
+- train
+- test
+preprocessing_num_workers: 24
+# DPOTrainer arguments
+bf16: true
+beta: 0.01
+learning_rate: 2.0e-7
+gradient_accumulation_steps: 16
+per_device_train_batch_size: 2
+per_device_eval_batch_size: 4
+num_train_epochs: 1
+max_length: 2048
+max_prompt_length: 1800
+warmup_ratio: 0.1
+logging_steps: 1
+lr_scheduler_type: cosine
+optim: adamw_torch
+torch_dtype: null
+# use_flash_attention_2: true
+do_eval: true
+evaluation_strategy: steps
+eval_steps: 100
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: False
+log_level: info
+push_to_hub: true
+save_total_limit: 0
+seed: 42
+report_to:
+- wandb
+```
+</details><be>
+## 📚 Citation
+If you find the model, data, or code useful, please cite:
+```
+@article{xu2024magpie,
+	title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
+	author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
+	year={2024},
+	eprint={2406.08464},
+	archivePrefix={arXiv},
+	primaryClass={cs.CL}
+}
+@article{xu2024stronger,
+  title={Stronger Models are NOT Stronger Teachers for Instruction Tuning},
+  author={Xu, Zhangchen and Jiang, Fengqing and Niu, Luyao and Lin, Bill Yuchen and Poovendran, Radha},
+  journal={arXiv preprint arXiv:2411.07133},
+  year={2024}
+}
+```
+**Contact**
+Questions? Contact:
+- [Zhangchen Xu](https://zhangchenxu.com/) [zxu9 at uw dot edu], and
+- [Bill Yuchen Lin](https://yuchenlin.xyz/) [yuchenlin1995 at gmail dot com]