Fix typo in `README.md`
#48
by
alvarobartt
HF Staff
- opened
README.md
CHANGED
|
@@ -180,7 +180,7 @@ Falcon-7B is a causal decoder-only model trained on a causal language modeling t
|
|
| 180 |
|
| 181 |
The architecture is broadly adapted from the GPT-3 paper ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)), with the following differences:
|
| 182 |
|
| 183 |
-
* **
|
| 184 |
* **Attention:** multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)) and FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135));
|
| 185 |
* **Decoder-block:** parallel attention/MLP with a single layer norm.
|
| 186 |
|
|
|
|
| 180 |
|
| 181 |
The architecture is broadly adapted from the GPT-3 paper ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)), with the following differences:
|
| 182 |
|
| 183 |
+
* **Positional embeddings:** rotary ([Su et al., 2021](https://arxiv.org/abs/2104.09864));
|
| 184 |
* **Attention:** multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)) and FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135));
|
| 185 |
* **Decoder-block:** parallel attention/MLP with a single layer norm.
|
| 186 |
|