Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,366 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# LongLoRA and LongAlpaca for Long-context LLMs
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
[](https://huggingface.co/Yukang)
|
| 5 |
+
[](https://github.com/dvlab-research/LongLoRA)
|
| 6 |
+
[](https://huggingface.co/datasets/Yukang/LongAlpaca-12k)
|
| 7 |
+
[](https://arxiv.org/abs/2309.12307)
|
| 8 |
+
|
| 9 |
+
[](https://github.com/dvlab-research/LongLoRA/blob/main/LICENSE)
|
| 10 |
+
[](https://github.com/dvlab-research/LongLoRA/blob/main/DATA_LICENSE)
|
| 11 |
+
[](https://github.com/dvlab-research/LongLoRA/blob/main/WEIGHT_LICENSE)
|
| 12 |
+
|
| 13 |
+
For detailed usage and codes, please visit the [Github project](https://github.com/dvlab-research/LongLoRA).
|
| 14 |
+
## TABLE OF CONTENTS
|
| 15 |
+
1. [News](#news)
|
| 16 |
+
2. [Examples](#examples)
|
| 17 |
+
3. [Highlights](#highlights)
|
| 18 |
+
4. [How to contribute](#how-to-contribute)
|
| 19 |
+
5. [Requirements](#usage-requirements)
|
| 20 |
+
6. [Installation and quick guide](#installation-and-quick-guide)
|
| 21 |
+
7. [LongAlpaca Data](#longalpaca-data)
|
| 22 |
+
8. [Models](#models)
|
| 23 |
+
9. [Training](#training)
|
| 24 |
+
10. [Evaluation](#evaluation)
|
| 25 |
+
11. [Demo](#demo)
|
| 26 |
+
12. [Data Generation via Pdf2Text](#data-generation-via-pdf2text)
|
| 27 |
+
13. [Citation](#citation)
|
| 28 |
+
14. [Acknowledgement](#acknowledgement)
|
| 29 |
+
15. [License](#license)
|
| 30 |
+
|
| 31 |
+
## News
|
| 32 |
+
- [x] [2023.10.8] **We release the long instruction-following dataset**, [LongAlpaca-12k](https://huggingface.co/datasets/Yukang/LongAlpaca-12k) and **the corresponding models**, [LongAlpaca-7B](https://huggingface.co/Yukang/LongAlpaca-7B), [LongAlpaca-13B](https://huggingface.co/Yukang/LongAlpaca-13B), and [LongAlpaca-70B](https://huggingface.co/Yukang/LongAlpaca-70B).
|
| 33 |
+
- (*The previous sft models*, [Llama-2-13b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-13b-chat-longlora-32k-sft) and [Llama-2-70b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-70b-chat-longlora-32k-sft), *have been depreciated*.)
|
| 34 |
+
- [x] [2023.10.3] We add support GPTNeoX models. Please refer to this [PR](https://github.com/dvlab-research/LongLoRA/pull/32) for usage. Thanks for @naubull2 for this contribution.
|
| 35 |
+
- [x] [2023.9.22] We release all our fine-tuned [models](https://huggingface.co/Yukang), including **70B-32k models**, [LLaMA2-LongLoRA-70B-32k](https://huggingface.co/Yukang/Llama-2-70b-longlora-32k), [LLaMA2-LongLoRA-7B-100k](https://huggingface.co/Yukang/Llama-2-7b-longlora-100k-ft). Welcome to check them out!
|
| 36 |
+
- [x] [2023.9.22] We release [Paper](http://arxiv.org/abs/2309.12307) and this GitHub repo, including training and evaluation code.
|
| 37 |
+
|
| 38 |
+
**LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models [[Paper](http://arxiv.org/abs/2309.12307)]** <br />
|
| 39 |
+
[Yukang Chen](https://scholar.google.com/citations?user=6p0ygKUAAAAJ&hl=en),
|
| 40 |
+
[Shengju Qian](https://scholar.google.com/citations?user=QNnWmasAAAAJ),
|
| 41 |
+
[Haotian Tang](https://scholar.google.com/citations?user=WxL13BAAAAAJ&hl),
|
| 42 |
+
[Xin Lai](https://scholar.google.com/citations?user=tqNDPA4AAAAJ&hl=zh-CN),
|
| 43 |
+
[Zhijian Liu](https://scholar.google.com/citations?user=3coYSTUAAAAJ&hl=en),
|
| 44 |
+
[Song Han](https://scholar.google.com/citations?user=E0iCaa4AAAAJ&hl=zh-CN),
|
| 45 |
+
[Jiaya Jia](https://scholar.google.com/citations?user=XPAkzTEAAAAJ&hl=en)<br />
|
| 46 |
+
|
| 47 |
+
## Highlights
|
| 48 |
+
1. In LongLoRA approach, The proposed shifted short attention is easy to implement, compatible with Flash-Attention, and is not required during inference.
|
| 49 |
+
2. We released all our models, including models from 7B to 70B, context length from 8k to 100k, including [LLaMA2-LongLoRA-7B-100k](https://huggingface.co/Yukang/Llama-2-7b-longlora-100k-ft), [LLaMA2-LongLoRA-13B-64k](https://huggingface.co/Yukang/Llama-2-13b-longlora-64k), and [LLaMA2-LongLoRA-70B-32k](https://huggingface.co/Yukang/Llama-2-70b-longlora-32k).
|
| 50 |
+
3. We built up a long-context instruction-following dataset, [LongAlpaca-12k](#longalpaca-data). We released the corresponding [LongAlpaca-7B](https://huggingface.co/Yukang/LongAlpaca-7B), [LongAlpaca-13B](https://huggingface.co/Yukang/LongAlpaca-13B) and [LongAlpaca-70B](https://huggingface.co/Yukang/LongAlpaca-70B) models. To our best knowledge, this is the first open-sourced long-context 70B model.
|
| 51 |
+
|
| 52 |
+
## How to Contribute
|
| 53 |
+
- Make sure to have git installed.
|
| 54 |
+
- Create your own [fork](https://github.com/dvlab-research/LongLoRA/fork) of the project.
|
| 55 |
+
- Clone the repository on your local machine, using git clone and pasting the url of this project.
|
| 56 |
+
- Read both the `Requirements` and `Installation and Quick Guide` sections below.
|
| 57 |
+
- Commit and push your changes.
|
| 58 |
+
- Make a pull request when finished modifying the project.
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
## Usage Requirements
|
| 62 |
+
To download and use the [pre-trained weights](#pre-trained-weights) you will need:
|
| 63 |
+
1. Hugging Face (HF) account with valid email. Note, the email used for HF must alse be used for the license agreement.
|
| 64 |
+
2. Accept the Meta [license and acceptable use policy](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
## Installation and Quick Guide
|
| 68 |
+
To install and run the application:
|
| 69 |
+
1. [Fork this repo](https://github.com/dvlab-research/LongLoRA/fork) on github
|
| 70 |
+
2. Clone the repository on your local machine, using git clone and pasting the url of this project.
|
| 71 |
+
3. Run the following code:
|
| 72 |
+
```
|
| 73 |
+
pip install -r requirements.txt
|
| 74 |
+
pip install flash-attn --no-build-isolation
|
| 75 |
+
```
|
| 76 |
+
4. Use either a [Released model](#released-models) or [Fine tune](#fine-tuning) a model to fit your preferences.
|
| 77 |
+
5. Test your model by chat.
|
| 78 |
+
6. Deploy your own demo.
|
| 79 |
+
|
| 80 |
+
## LongAlpaca Data
|
| 81 |
+
|
| 82 |
+
LongAlpaca-12k contains 9k long QA data that we collected and 3k short QA sampled from the original [Alpaca data](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). This is to avoid the case that the model might degrade at short instruction following. The data we collect contains various types and amounts as the following figure.
|
| 83 |
+
|
| 84 |
+
| Data | Short QA | Long QA | Total | Download |
|
| 85 |
+
|:---------------|----------|----------|----------|----------|
|
| 86 |
+
| LongAlpaca-12k | 3k | 9k | 12k | [Link](https://huggingface.co/datasets/Yukang/LongAlpaca-12k) |
|
| 87 |
+
|
| 88 |
+
Following the original Alpaca format, our Long QA data uses the following prompts for fine-tuning:
|
| 89 |
+
- `instruction`: `str`, describes the task the model should perform. For example, to answer a question after reading a book section or paper. We vary the contents and questions to make instructions diverse.
|
| 90 |
+
- `output`: `str`, the answer to the instruction.
|
| 91 |
+
|
| 92 |
+
We did not use the `input` format in the Alpaca format for simplicity.
|
| 93 |
+
|
| 94 |
+
## Models
|
| 95 |
+
|
| 96 |
+
### Models with supervised fine-tuning
|
| 97 |
+
| Model | Size | Context | Train | Link |
|
| 98 |
+
|:---------------|------|---------|---------|-----------------------------------------------------------------------------------------------------------------------|
|
| 99 |
+
| LongAlpaca-7B | 7B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-7B) |
|
| 100 |
+
| LongAlpaca-13B | 13B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-13B) |
|
| 101 |
+
| LongAlpaca-70B | 70B | 32768 | LoRA+ | [Model](https://huggingface.co/Yukang/LongAlpaca-70B) [(LoRA-weight)](https://huggingface.co/Yukang/LongAlpaca-70B-lora) |
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
### Models with context extension via fully fine-tuning
|
| 105 |
+
| Model | Size | Context | Train | Link |
|
| 106 |
+
|:----------------------------|------|---------|-------|-------------------------------------------------------------------|
|
| 107 |
+
| Llama-2-7b-longlora-8k-ft | 7B | 8192 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-8k-ft) |
|
| 108 |
+
| Llama-2-7b-longlora-16k-ft | 7B | 16384 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-16k-ft) |
|
| 109 |
+
| Llama-2-7b-longlora-32k-ft | 7B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-32k-ft) |
|
| 110 |
+
| Llama-2-7b-longlora-100k-ft | 7B | 100000 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-100k-ft) |
|
| 111 |
+
| Llama-2-13b-longlora-8k-ft | 13B | 8192 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-13b-longlora-8k-ft) |
|
| 112 |
+
| Llama-2-13b-longlora-16k-ft | 13B | 16384 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-13b-longlora-16k-ft) |
|
| 113 |
+
| Llama-2-13b-longlora-32k-ft | 13B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-13b-longlora-32k-ft) |
|
| 114 |
+
|
| 115 |
+
### Models with context extension via improved LoRA fine-tuning
|
| 116 |
+
| Model | Size | Context | Train | Link |
|
| 117 |
+
|:----------------------------|------|---------|-------|---------------------------------------------------------------------|
|
| 118 |
+
| Llama-2-7b-longlora-8k | 7B | 8192 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-7b-longlora-8k) |
|
| 119 |
+
| Llama-2-7b-longlora-16k | 7B | 16384 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-7b-longlora-16k) |
|
| 120 |
+
| Llama-2-7b-longlora-32k | 7B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-7b-longlora-32k) |
|
| 121 |
+
| Llama-2-13b-longlora-8k | 13B | 8192 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-8k) |
|
| 122 |
+
| Llama-2-13b-longlora-16k | 13B | 16384 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-16k) |
|
| 123 |
+
| Llama-2-13b-longlora-32k | 13B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-32k) |
|
| 124 |
+
| Llama-2-13b-longlora-64k | 13B | 65536 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-64k) |
|
| 125 |
+
| Llama-2-70b-longlora-32k | 70B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-70b-longlora-32k) |
|
| 126 |
+
| Llama-2-70b-chat-longlora-32k | 70B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-70b-chat-longlora-32k) |
|
| 127 |
+
|
| 128 |
+
## Training
|
| 129 |
+
### Pre-trained weights
|
| 130 |
+
We use LLaMA2 models as the pre-trained weights and fine-tune them to long context window sizes. Download based on your choices.
|
| 131 |
+
|
| 132 |
+
| Pre-trained weights |
|
| 133 |
+
|:-------------------------------------------------------------------------------------|
|
| 134 |
+
| [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) |
|
| 135 |
+
|[Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) |
|
| 136 |
+
| [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) |
|
| 137 |
+
| [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) |
|
| 138 |
+
| [Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) |
|
| 139 |
+
| [Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) |
|
| 140 |
+
|
| 141 |
+
This project also supports GPTNeoX models as the base model architecture. Some candidate pre-trained weights may include [GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b), [Polyglot-ko-12.8B](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) and other variants.
|
| 142 |
+
|
| 143 |
+
### Fine-tuning
|
| 144 |
+
```
|
| 145 |
+
torchrun --nproc_per_node=8 fine-tune.py \
|
| 146 |
+
--model_name_or_path path_to/Llama-2-7b-hf \
|
| 147 |
+
--bf16 True \
|
| 148 |
+
--output_dir path_to_saving_checkpoints \
|
| 149 |
+
--cache_dir path_to_cache \
|
| 150 |
+
--model_max_length 8192 \
|
| 151 |
+
--use_flash_attn True \
|
| 152 |
+
--low_rank_training False \
|
| 153 |
+
--num_train_epochs 1 \
|
| 154 |
+
--per_device_train_batch_size 1 \
|
| 155 |
+
--per_device_eval_batch_size 2 \
|
| 156 |
+
--gradient_accumulation_steps 8 \
|
| 157 |
+
--evaluation_strategy "no" \
|
| 158 |
+
--save_strategy "steps" \
|
| 159 |
+
--save_steps 1000 \
|
| 160 |
+
--save_total_limit 2 \
|
| 161 |
+
--learning_rate 2e-5 \
|
| 162 |
+
--weight_decay 0.0 \
|
| 163 |
+
--warmup_steps 20 \
|
| 164 |
+
--lr_scheduler_type "constant_with_warmup" \
|
| 165 |
+
--logging_steps 1 \
|
| 166 |
+
--deepspeed "ds_configs/stage2.json" \
|
| 167 |
+
--tf32 True \
|
| 168 |
+
--max_steps 1000
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
- Please remember to change `path_to/Llama-2-7b-hf`, `path_to_saving_checkpoints`, `path_to_cache` to your own directory.
|
| 172 |
+
- Note that you can change `model_max_length` to other values.
|
| 173 |
+
- You could change `ds_configs/stage2.json` to `ds_configs/stage3.json` if you want.
|
| 174 |
+
- Please set `use_flash_attn` as `False` if you use V100 machines or do not install flash attention.
|
| 175 |
+
- You can set `low_rank_training` as `False` if you want to use fully fine-tuning. It will cost more GPU memory and slower, but the performance will be a bit better.
|
| 176 |
+
- When training is finished, to get the full model weight:
|
| 177 |
+
```
|
| 178 |
+
cd path_to_saving_checkpoints && python zero_to_fp32.py . pytorch_model.bin
|
| 179 |
+
```
|
| 180 |
+
|
| 181 |
+
### Supervised Fine-tuning
|
| 182 |
+
```
|
| 183 |
+
torchrun --nproc_per_node=8 supervised-fine-tune.py \
|
| 184 |
+
--model_name_or_path path_to_Llama2_chat_models \
|
| 185 |
+
--bf16 True \
|
| 186 |
+
--output_dir path_to_saving_checkpoints \
|
| 187 |
+
--model_max_length 32768 \
|
| 188 |
+
--use_flash_attn True \
|
| 189 |
+
--data_path LongAlpaca-12k.json \
|
| 190 |
+
--low_rank_training True \
|
| 191 |
+
--num_train_epochs 3 \
|
| 192 |
+
--per_device_train_batch_size 1 \
|
| 193 |
+
--per_device_eval_batch_size 2 \
|
| 194 |
+
--gradient_accumulation_steps 1 \
|
| 195 |
+
--evaluation_strategy "no" \
|
| 196 |
+
--save_strategy "steps" \
|
| 197 |
+
--save_steps 1000 \
|
| 198 |
+
--save_total_limit 2 \
|
| 199 |
+
--learning_rate 2e-5 \
|
| 200 |
+
--weight_decay 0.0 \
|
| 201 |
+
--warmup_steps 20 \
|
| 202 |
+
--lr_scheduler_type "constant_with_warmup" \
|
| 203 |
+
--logging_steps 1 \
|
| 204 |
+
--deepspeed "ds_configs/stage2.json" \
|
| 205 |
+
--tf32 True
|
| 206 |
+
```
|
| 207 |
+
- There is no need to make supervised fine-tuning upon the fine-tuned context extended models. It is all right to directly use base model as Llama2-chat models, as the amount of long instruction following data is enough for SFT.
|
| 208 |
+
- Our long instruction following data can be found in [LongAlpaca-12k.json](https://huggingface.co/datasets/Yukang/LongAlpaca-12k).
|
| 209 |
+
|
| 210 |
+
|
| 211 |
+
### Get trainable weights in low-rank training
|
| 212 |
+
In low-rank training, we set embedding and normalization layers as trainable. Please use the following line to extract the trainable weights `trainable_params.bin` from `pytorch_model.bin`
|
| 213 |
+
```
|
| 214 |
+
python3 get_trainable_weights.py --checkpoint_path path_to_saving_checkpoints --trainable_params "embed,norm"
|
| 215 |
+
```
|
| 216 |
+
|
| 217 |
+
### Merge LoRA Weight
|
| 218 |
+
Merge the LoRA weights of `pytorch_model.bin` and trainable parameters `trainable_params.bin`, save the resulting model into your desired path in the Hugging Face format:
|
| 219 |
+
```
|
| 220 |
+
python3 merge_lora_weights_and_save_hf_model.py \
|
| 221 |
+
--base_model path_to/Llama-2-7b-hf \
|
| 222 |
+
--peft_model path_to_saving_checkpoints \
|
| 223 |
+
--context_size 8192 \
|
| 224 |
+
--save_path path_to_saving_merged_model
|
| 225 |
+
```
|
| 226 |
+
For example,
|
| 227 |
+
```
|
| 228 |
+
python3 merge_lora_weights_and_save_hf_model.py \
|
| 229 |
+
--base_model /dataset/pretrained-models/Llama-2-7b-hf \
|
| 230 |
+
--peft_model /dataset/yukangchen/hf_models/lora-models/Llama-2-7b-longlora-8k \
|
| 231 |
+
--context_size 8192 \
|
| 232 |
+
--save_path /dataset/yukangchen/models/Llama-2-7b-longlora-8k-merged
|
| 233 |
+
```
|
| 234 |
+
|
| 235 |
+
|
| 236 |
+
## Evaluation
|
| 237 |
+
### Perplexity Validation
|
| 238 |
+
To evaluate a model that is trained in the low-rank setting, please set both `base_model` and `peft_model`. `base_model` is the pre-trained weight. `peft_model` is the path to the saved checkpoint, which should contain `trainable_params.bin`, `adapter_model.bin` and `adapter_config.json`. For example,
|
| 239 |
+
```
|
| 240 |
+
python3 eval.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to/Llama-2-7b-hf --peft_model path_to_saving_checkpoints --data_path pg19/test.bin
|
| 241 |
+
```
|
| 242 |
+
|
| 243 |
+
To evaluate a model that is fully fine-tuned, you only need to set `base_model` as the path to the saved checkpoint, which should contain `pytorch_model.bin` and `config.json`. `peft_model` should be ignored.
|
| 244 |
+
```
|
| 245 |
+
python3 eval.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to_saving_checkpoints --data_path pg19/test.bin
|
| 246 |
+
```
|
| 247 |
+
|
| 248 |
+
- Note that `--seq_len` is to set the sequence length for evaluation. `--context_size` is to set the context length of the model during fine-tuning. `--seq_len` should not be larger than `--context_size`.
|
| 249 |
+
|
| 250 |
+
- We have already tokenized the validation and test splits of PG19 and proof-pile dataset into `pg19/validation.bin`, `pg19/test.bin`, and `proof-pile/test_sampled_data.bin`, with the tokenizer of LLaMA. `proof-pile/test_sampled_data.bin` contains 128 documents that are randomly sampled from the total proof-pile test split. For each document, it has at least 32768 tokens. We also release the sampled ids in [proof-pile/test_sampled_ids.bin](https://drive.google.com/file/d/1cnzWODLRQYAd7HeugzLCIhaqzaLZv7J5/view?usp=share_link). You can download them from the links below.
|
| 251 |
+
|
| 252 |
+
| Dataset | Split | Link |
|
| 253 |
+
|:-----------|------------|--------------------------------------------------------------------------------------------------------------|
|
| 254 |
+
| PG19 | validation | [pg19/validation.bin](https://drive.google.com/file/d/1rbJvb0qRIf2mQoN2ON7S93TbTzMnlrN6/view?usp=share_link) |
|
| 255 |
+
| PG19 | test | [pg19/test.bin](https://drive.google.com/file/d/1QANDMdctpacPAYgS04adDXqByGEq-Ret/view?usp=share_link) |
|
| 256 |
+
| Proof-pile | test | [proof-pile/test_sampled_data.bin](https://drive.google.com/file/d/1bUI5lPDvrqzY_XXJJ2sSuvZx0Y9AZClE/view?usp=share_link) |
|
| 257 |
+
|
| 258 |
+
|
| 259 |
+
### Passkey Retrieval
|
| 260 |
+
We provide a manner to test the passkey retrieval accuracy. For example,
|
| 261 |
+
```
|
| 262 |
+
python3 passkey_retrivial.py \
|
| 263 |
+
--context_size 32768 \
|
| 264 |
+
--base_model path_to/Llama-2-7b-longlora-32k \
|
| 265 |
+
--max_tokens 32768 \
|
| 266 |
+
--interval 1000
|
| 267 |
+
```
|
| 268 |
+
- Note that the `context_size` is the context length during fine-tuning.
|
| 269 |
+
- `max_tokens` is maximum length for the document in passkey retrieval evaluation.
|
| 270 |
+
- `interval` is the interval during the document length increasing. It is a rough number because the document increases by sentences.
|
| 271 |
+
|
| 272 |
+
## Demo
|
| 273 |
+
### Local Inference
|
| 274 |
+
To chat with [Llama-2-13b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-13b-chat-longlora-32k-sft) or [Llama-2-70b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-70b-chat-longlora-32k-sft), you need to run `merge_lora_weights_and_save_hf_model.py` first, and then:
|
| 275 |
+
```
|
| 276 |
+
python3 inference.py \
|
| 277 |
+
--base_model path_to_model \
|
| 278 |
+
--question $question \
|
| 279 |
+
--context_size $context_length \
|
| 280 |
+
--max_gen_len $max_gen_len \
|
| 281 |
+
--flash_attn True \
|
| 282 |
+
--material $material_content \
|
| 283 |
+
--material_type $material_type \
|
| 284 |
+
--material_title $material_title
|
| 285 |
+
```
|
| 286 |
+
To ask a question related to a book:
|
| 287 |
+
```
|
| 288 |
+
python3 inference.py \
|
| 289 |
+
--base_model /data/models/Llama-2-13b-chat-longlora-32k-sft \
|
| 290 |
+
--question "Why doesn't Professor Snape seem to like Harry?" \
|
| 291 |
+
--context_size 32768 \
|
| 292 |
+
--max_gen_len 512 \
|
| 293 |
+
--flash_attn True \
|
| 294 |
+
--material "materials/Harry Potter and the Philosophers Stone_section2.txt" \
|
| 295 |
+
--material_type "book" \
|
| 296 |
+
--material_title "Harry Potter and the Philosophers Stone"
|
| 297 |
+
```
|
| 298 |
+
Note that you can ignore `material_type` or `material_title`.
|
| 299 |
+
|
| 300 |
+
To ask a question related to a paper:
|
| 301 |
+
```
|
| 302 |
+
python3 inference.py \
|
| 303 |
+
--base_model /data/models/Llama-2-13b-chat-longlora-32k-sft \
|
| 304 |
+
--question "What are the main contributions and novelties of this work?" \
|
| 305 |
+
--context_size 32768 \
|
| 306 |
+
--max_gen_len 512 \
|
| 307 |
+
--flash_attn True \
|
| 308 |
+
--material "materials/paper1.txt" \
|
| 309 |
+
--material_type "paper"
|
| 310 |
+
```
|
| 311 |
+
|
| 312 |
+
### Online Demo
|
| 313 |
+
To deploy your own demo run
|
| 314 |
+
```
|
| 315 |
+
python3 demo.py \
|
| 316 |
+
--base_model path_to_model \
|
| 317 |
+
--context_size $context_size \
|
| 318 |
+
--max_gen_len $max_gen_len \
|
| 319 |
+
--flash_attn True
|
| 320 |
+
```
|
| 321 |
+
Example
|
| 322 |
+
```
|
| 323 |
+
python3 demo.py \
|
| 324 |
+
--base_model /data/models/Llama-2-13b-chat-longlora-32k-sft \
|
| 325 |
+
--context_size 32768 \
|
| 326 |
+
--max_gen_len 512 \
|
| 327 |
+
--flash_attn True
|
| 328 |
+
```
|
| 329 |
+
- Note that `flash_attn=True` will make the generation slow but save much GPU memory.
|
| 330 |
+
|
| 331 |
+
## Data Generation via Pdf2text
|
| 332 |
+
During our dataset collection, we convert paper and books from pdf to text. The conversion quality has a large influence on the final model quality. We think that this step is non-trivial. We release the tool for the pdf2txt conversion, in the folder `pdf2txt`. It is built upon `pdf2image`, `easyocr`, `ditod` and `detectron2`. Please refer to the [README.md](pdf2txt/README.md) in `pdf2txt` for more details.
|
| 333 |
+
|
| 334 |
+
## Citation
|
| 335 |
+
If you find this project useful in your research, please consider citing:
|
| 336 |
+
|
| 337 |
+
```
|
| 338 |
+
@article{longlora,
|
| 339 |
+
title={LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models},
|
| 340 |
+
author={Yukang Chen and Shengju Qian and Haotian Tang and Xin Lai and Zhijian Liu and Song Han and Jiaya Jia},
|
| 341 |
+
journal={arXiv:2309.12307},
|
| 342 |
+
year={2023}
|
| 343 |
+
}
|
| 344 |
+
```
|
| 345 |
+
|
| 346 |
+
|
| 347 |
+
```
|
| 348 |
+
@misc{long-alpaca,
|
| 349 |
+
author = {Yukang Chen and Shaozuo Yu and Shengju Qian and Haotian Tang and Xin Lai and Zhijian Liu and Song Han and Jiaya Jia},
|
| 350 |
+
title = {Long Alpaca: Long-context Instruction-following models},
|
| 351 |
+
year = {2023},
|
| 352 |
+
publisher = {GitHub},
|
| 353 |
+
journal = {GitHub repository},
|
| 354 |
+
howpublished = {\url{https://github.com/dvlab-research/LongLoRA}},
|
| 355 |
+
}
|
| 356 |
+
```
|
| 357 |
+
## Acknowledgement
|
| 358 |
+
- This work is built upon the [LLaMA2](https://ai.meta.com/llama) as the pre-trained models.
|
| 359 |
+
- This work can also be built upon the [GPTNeoX-HF](https://huggingface.co/docs/transformers/model_doc/gpt_neox) which is based upon [EleutherAI/GPTNeoX](https://github.com/EleutherAI/gpt-neox) as the pre-trained model architecture.
|
| 360 |
+
- This work is based on [DeepSpeed](https://github.com/microsoft/DeepSpeed), [peft](https://github.com/huggingface/peft), and [Flash-Attention2](https://github.com/Dao-AILab/flash-attention) for acceleration.
|
| 361 |
+
- Some evaluation code is modified upon [Landmark Attention](https://github.com/epfml/landmark-attention).
|
| 362 |
+
- We use [LongChat](https://github.com/DachengLi1/LongChat) for the retrieval evaluation.
|
| 363 |
+
|
| 364 |
+
## License
|
| 365 |
+
- LongLoRA is licensed under the Apache License 2.0. This means that it requires the preservation of copyright and license notices.
|
| 366 |
+
- Data and weights are under CC-BY-NC 4.0 License. They are licensed for research use only, and allowed only non-commercial. Models trained using the dataset should not be used outside of research purposes.
|