Update README.md
Browse files
README.md
CHANGED
|
@@ -32,16 +32,16 @@ language:
|
|
| 32 |
|
| 33 |
SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages, advanced reasoning and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale.
|
| 34 |
|
| 35 |
-
The model is a decoder-only transformer using GQA and NoRope, it was trained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data.
|
| 36 |
|
| 37 |
|
| 38 |

|
| 39 |
|
| 40 |
### Key features
|
| 41 |
-
- **Long context:** Trained on 64k context and suppots up to **128k tokens** using YARN extrapolation
|
| 42 |
-
- **Multilingual**: 6 natively supported (English, French, Spanish, German, Italian, and Portuguese)
|
| 43 |
- Instruct model optimized for **hybrid reasoning**
|
| 44 |
- **Fully open model**: open weights + full training details including public data mixture and training configs
|
|
|
|
|
|
|
| 45 |
|
| 46 |
For more details refer to our blog post: TODO
|
| 47 |
|
|
@@ -184,7 +184,8 @@ SmolLM3 can produce text on a variety of topics, but the generated content may n
|
|
| 184 |
- **Training Framework:** [nanotron](https://github.com/huggingface/nanotron/tree/main)
|
| 185 |
- **Data processing framework:** [datatrove](https://github.com/huggingface/datatrove)
|
| 186 |
- **Evaluation framework:** [lighteval](https://github.com/huggingface/lighteval)
|
| 187 |
-
|
|
|
|
| 188 |
### Open resources
|
| 189 |
Here is an infographic with all the training details [TODO].
|
| 190 |
- The datasets used for pretraining can be found in this [collection](https://huggingface.co/collections/HuggingFaceTB/smollm3-pretraining-datasets-685a7353fdc01aecde51b1d9) and those used in mid-training and pos-training can be found here [TODO]
|
|
|
|
| 32 |
|
| 33 |
SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages, advanced reasoning and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale.
|
| 34 |
|
| 35 |
+
The model is a decoder-only transformer using GQA and NoRope, it was trained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 100B reasoning followed by supervised fine-tuning and alignment via Anchored Preference Optimization.
|
| 36 |
|
| 37 |
|
| 38 |

|
| 39 |
|
| 40 |
### Key features
|
|
|
|
|
|
|
| 41 |
- Instruct model optimized for **hybrid reasoning**
|
| 42 |
- **Fully open model**: open weights + full training details including public data mixture and training configs
|
| 43 |
+
- **Long context:** Trained on 64k context and suppots up to **128k tokens** using YARN extrapolation
|
| 44 |
+
- **Multilingual**: 6 natively supported (English, French, Spanish, German, Italian, and Portuguese)
|
| 45 |
|
| 46 |
For more details refer to our blog post: TODO
|
| 47 |
|
|
|
|
| 184 |
- **Training Framework:** [nanotron](https://github.com/huggingface/nanotron/tree/main)
|
| 185 |
- **Data processing framework:** [datatrove](https://github.com/huggingface/datatrove)
|
| 186 |
- **Evaluation framework:** [lighteval](https://github.com/huggingface/lighteval)
|
| 187 |
+
- **Postraining Framework:** [TRL](https://github.com/huggingface/trl)
|
| 188 |
+
|
| 189 |
### Open resources
|
| 190 |
Here is an infographic with all the training details [TODO].
|
| 191 |
- The datasets used for pretraining can be found in this [collection](https://huggingface.co/collections/HuggingFaceTB/smollm3-pretraining-datasets-685a7353fdc01aecde51b1d9) and those used in mid-training and pos-training can be found here [TODO]
|