Instructions to use OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints
- SGLang
How to use OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints with Docker Model Runner:
docker model run hf.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints
| license: apache-2.0 | |
| language: | |
| - en | |
| - zh | |
| pipeline_tag: text-generation | |
| tags: | |
| - ' TransNormerLLM' | |
| <div align="center"> | |
| <h1> | |
| TransNormerLLM3 -- A Faster and Better LLM | |
| </h1> | |
| </div> | |
| # Introduction | |
| This official repository unveils the TransNormerLLM3 model along with its open-source weights for every 50 billion tokens processed during pre-training. | |
| [TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance. | |
| # TransNormerLLM3 | |
| - **TransNormerLLM3-15B** features **14.83 billion** parameters. It is structured with **42 layers**, includes **40 attention heads**, and has a total **embedding size of 5120**. | |
| - **TransNormerLLM3-15B** is purely intergrated with **[Lightning Attention-2](http://arxiv.org/abs/2401.04658)**, which can maintain a **stable TGS** during training of **unlimited sequence lengths**, up until encountering firm limitations like GPU memory constraints. | |
| - **Titoken** tokenizer is used with a total **vocabulary size** of about **100,000**. | |
| <p align="center"> | |
| <img src="./images/TransNormer3.jpg" width="65%" /> | |
| </p> | |
| ### Pre-training Logbook | |
| * Realtime Track: https://api.wandb.ai/links/opennlplab/kip314lq | |
| * Join to dicussion: [discord](https://discord.gg/JEU3nTcWKC) <<<>>> [wechat group](https://github.com/OpenNLPLab/TransnormerLLM/blob/main/images/contact_me_qr.png) | |
| > --23.12.25-- startup: [WeChat - ้ข่ฎญ็ปๅฏ่ช](https://mp.weixin.qq.com/s/YjUY-uy89WkF75_-rBTuKw) <<<>>> [Twitter - Pre-training Commences ](https://twitter.com/opennlplab/status/1739568669502611825) <<<>>> [YouTube Recording](https://t.co/wk7svS4o5r) <<<>>> [bilibili ๅๆพ](https://www.bilibili.com/video/BV11j411J7Dy) | |
| > --24.01.02-- first week review: [WeChat - ็ฌฌไธๅจๆฆ่ง](https://mp.weixin.qq.com/s/zwGnZZI3itNPoxzzXkuU2w) <<<>>> [Twitter - Week 1 Review](https://twitter.com/opennlplab/status/1742187694078501038) | |
| > --24.01.09-- second week review: [WeChat - ็ฌฌไบๅจๆฆ่ง](https://mp.weixin.qq.com/s/6D0qi-0aBier05OKuHfPEA) <<<>>> [Twitter - Week 2 Review](https://twitter.com/opennlplab/status/1744720007299523063) | |
| > --24.01.15-- third week review: [WeChat - ็ฌฌไธๅจๆฆ่ง](https://mp.weixin.qq.com/s/EQg8evZ2cNtAk4HruwCXPA) <<<>>> [Twitter - Week 3 Review](https://twitter.com/opennlplab/status/1746920293069910190) | |
| > --24.01.23-- third week review: [WeChat - ็ฌฌๅๅจๆฆ่ง](https://mp.weixin.qq.com/s/l7LrFGQKkPU38exUtSF4cw) <<<>>> [Twitter - Week 4 Review](https://twitter.com/opennlplab/status/1749821039360840001) | |
| > --24.01.30-- third week review: [WeChat - ็ฌฌไบๅจๆฆ่ง](https://mp.weixin.qq.com/s/OgtQIb749IbX6y5C01bLFg) <<<>>> [Twitter - Week 5 Review](https://twitter.com/opennlplab/status/1752366090754425283) | |
| # Released Weights | |
| | param | token | Hugging Face | Model Scope | Wisemodel | | |
| | :-----: | :---: | :----------------------------------------------------------------------------------------------------------------------: | :---------: | :-------: | | |
| | **15B** | 50B | ๐ค[step13000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step13000-50Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 100B | ๐ค[step26000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step26000-100Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 150B | ๐ค[step39000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step39000-150Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 200B | ๐ค[step52000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step52000-200Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 250B | ๐ค[step65000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step65000-250Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 300B | ๐ค[step78000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step78000-300Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 350B | ๐ค[step92000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step92000-350Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 400B | ๐ค[step105000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step105000-400Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 450B | ๐ค[step118000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step118000-450Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 500B | ๐ค[step131000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step131000-500Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 550B | ๐ค[step144000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step144000-550Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 600B | ๐ค[step157000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step157000-600Btokens) | ๐ค | ๐ฏ | | |
| | **15B** | 650B | ๐ค[step170000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step170000-650Btokens) | ๐ค | ๐ฏ | | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints", revision='step170000-650Btokens', trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained("OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints", torch_dtype=torch.bfloat16, revision='step170000-650Btokens', device_map="auto", trust_remote_code=True) | |
| ``` | |
| # Benchmark Results | |
| The evaluations of all models are conducted using the official settings and the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework. | |
| | Model | P | T | BoolQ | PIQA | HS | WG | ARC-e | ARC-c | OBQA | C-Eval | MMLU | | |
| | ----------------------- | --- | ---- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------ | ----- | | |
| | **TransNormerLLM3-15B** | 15 | 0.05 | 62.08 | 72.52 | 55.55 | 57.14 | 62.12 | 31.14 | 32.40 | 26.18 | 27.50 | | |
| | **TransNormerLLM3-15B** | 15 | 0.10 | 63.98 | 74.70 | 61.09 | 61.33 | 65.95 | 34.64 | 35.60 | 25.38 | 27.40 | | |
| | **TransNormerLLM3-15B** | 15 | 0.15 | 60.34 | 75.08 | 63.99 | 62.04 | 64.56 | 34.90 | 35.20 | 22.64 | 26.60 | | |
| | **TransNormerLLM3-15B** | 15 | 0.20 | 52.05 | 74.48 | 64.72 | 62.75 | 66.16 | 35.15 | 36.80 | 27.25 | 30.80 | | |
| | **TransNormerLLM3-15B** | 15 | 0.25 | 66.70 | 76.50 | 66.51 | 64.80 | 66.84 | 36.18 | 39.40 | 30.87 | 36.10 | | |
| | **TransNormerLLM3-15B** | 15 | 0.30 | 67.00 | 76.50 | 67.17 | 64.40 | 66.29 | 36.77 | 38.80 | 33.99 | 37.60 | | |
| | **TransNormerLLM3-15B** | 15 | 0.35 | 65.78 | 75.46 | 67.88 | 66.54 | 67.34 | 38.57 | 39.60 | 36.02 | 39.20 | | |
| | **TransNormerLLM3-15B** | 15 | 0.40 | 67.34 | 75.24 | 68.51 | 66.22 | 68.94 | 40.10 | 39.20 | 41.10 | 39.01 | | |
| | **TransNormerLLM3-15B** | 15 | 0.45 | 69.02 | 76.28 | 69.11 | 63.77 | 65.82 | 36.01 | 39.40 | 37.17 | 42.80 | | |
| | **TransNormerLLM3-15B** | 15 | 0.50 | 66.15 | 77.09 | 69.75 | 65.11 | 68.56 | 35.84 | 39.60 | 39.81 | 42.00 | | |
| | **TransNormerLLM3-15B** | 15 | 0.55 | 70.24 | 74.05 | 69.96 | 65.75 | 65.61 | 36.69 | 38.60 | 40.08 | 44.00 | | |
| | **TransNormerLLM3-15B** | 15 | 0.60 | 74.34 | 75.68 | 70.44 | 66.22 | 69.36 | 38.40 | 38.40 | 41.05 | 45.30 | | |
| | **TransNormerLLM3-15B** | 15 | 0.65 | 73.15 | 76.55 | 71.60 | 66.46 | 69.65 | 39.68 | 40.80 | 41.20 | 44.90 | | |
| > **P**: parameter size (billion). **T**: tokens (trillion). **BoolQ**: acc. **PIQA**: acc. **HellaSwag**: acc_norm. **WinoGrande**: acc. **ARC-easy**: acc. **ARC-challenge**: acc_norm. **OpenBookQA**: acc_norm. **MMLU**: 5-shot acc. **C-Eval**: 5-shot acc. | |
| ```bash | |
| # Please configure the following settings when do evaluation | |
| export do_eval=True | |
| export use_triton=False | |
| ``` | |
| # Acknowledgments and Citation | |
| ## Acknowledgments | |
| Our project is developed based on the following open source projects: | |
| - [tiktoken](https://github.com/openai/tiktoken) for the tokenizer. | |
| - [metaseq](https://github.com/facebookresearch/metaseq) for training. | |
| - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) for evaluation. | |
| ## Citation | |
| If you wish to cite our work, please use the following reference: | |
| ``` | |
| @misc{qin2024transnormerllm, | |
| title={TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer}, | |
| author={Zhen Qin and Dong Li and Weigao Sun and Weixuan Sun and Xuyang Shen and Xiaodong Han and Yunshen Wei and Baohong Lv and Xiao Luo and Yu Qiao and Yiran Zhong}, | |
| year={2024}, | |
| eprint={2307.14995}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL} | |
| } | |
| @misc{qin2024lightning, | |
| title={Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models}, | |
| author={Zhen Qin and Weigao Sun and Dong Li and Xuyang Shen and Weixuan Sun and Yiran Zhong}, | |
| year={2024}, | |
| eprint={2401.04658}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL} | |
| } | |
| ``` | |
| <p align="center"> | |
| <img src="./images/lightning3-leopard.jpg" width="50%" /> | |
| - OpenNLPLab @2024 - | |
| </p> |