Update README.md
Browse files
README.md
CHANGED
|
@@ -77,7 +77,7 @@ model-index:
|
|
| 77 |
# Mistral-SUPRA
|
| 78 |
This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and up-trained into a linear RNN.
|
| 79 |
|
| 80 |
-
This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
|
| 81 |
Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/
|
| 82 |
|
| 83 |
We uptrain Mistral-7B on 100B tokens of RefinedWeb.
|
|
@@ -176,9 +176,8 @@ If you use this model, please cite our paper on Linearizing Large Language Model
|
|
| 176 |
@article{Mercat2024Linearizing,
|
| 177 |
title={Linearizing Large Language Models},
|
| 178 |
author={Jean Mercat and Igor Vasiljevic and Sedrick Keh and Kushal Arora and Achal Dave and Adrien Gaidon and Thomas Kollar},
|
| 179 |
-
journal={ArXiv},
|
| 180 |
year={2024},
|
| 181 |
-
|
| 182 |
}
|
| 183 |
```
|
| 184 |
|
|
|
|
| 77 |
# Mistral-SUPRA
|
| 78 |
This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and up-trained into a linear RNN.
|
| 79 |
|
| 80 |
+
This is an accompanying model of our paper [Linearizing Large Language Models](https://arxiv.org/abs/2405.06640), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
|
| 81 |
Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/
|
| 82 |
|
| 83 |
We uptrain Mistral-7B on 100B tokens of RefinedWeb.
|
|
|
|
| 176 |
@article{Mercat2024Linearizing,
|
| 177 |
title={Linearizing Large Language Models},
|
| 178 |
author={Jean Mercat and Igor Vasiljevic and Sedrick Keh and Kushal Arora and Achal Dave and Adrien Gaidon and Thomas Kollar},
|
|
|
|
| 179 |
year={2024},
|
| 180 |
+
journal={arXiv preprint arXiv:2405.06640},
|
| 181 |
}
|
| 182 |
```
|
| 183 |
|