loubnabnl HF Staff commited on
Commit
107d923
·
verified ·
1 Parent(s): 7804fff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -32,16 +32,16 @@ language:
32
 
33
  SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages, advanced reasoning and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale.
34
 
35
- The model is a decoder-only transformer using GQA and NoRope, it was trained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. The training framework is [nanotron](https://github.com/huggingface/nanotron/).
36
 
37
 
38
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/Zcm_016pWeyFr_uIkT7Ki.png)
39
 
40
  ### Key features
41
- - **Long context:** Trained on 64k context and suppots up to **128k tokens** using YARN extrapolation
42
- - **Multilingual**: 6 natively supported (English, French, Spanish, German, Italian, and Portuguese)
43
  - Instruct model optimized for **hybrid reasoning**
44
  - **Fully open model**: open weights + full training details including public data mixture and training configs
 
 
45
 
46
  For more details refer to our blog post: TODO
47
 
@@ -184,7 +184,8 @@ SmolLM3 can produce text on a variety of topics, but the generated content may n
184
  - **Training Framework:** [nanotron](https://github.com/huggingface/nanotron/tree/main)
185
  - **Data processing framework:** [datatrove](https://github.com/huggingface/datatrove)
186
  - **Evaluation framework:** [lighteval](https://github.com/huggingface/lighteval)
187
-
 
188
  ### Open resources
189
  Here is an infographic with all the training details [TODO].
190
  - The datasets used for pretraining can be found in this [collection](https://huggingface.co/collections/HuggingFaceTB/smollm3-pretraining-datasets-685a7353fdc01aecde51b1d9) and those used in mid-training and pos-training can be found here [TODO]
 
32
 
33
  SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages, advanced reasoning and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale.
34
 
35
+ The model is a decoder-only transformer using GQA and NoRope, it was trained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 100B reasoning followed by supervised fine-tuning and alignment via Anchored Preference Optimization.
36
 
37
 
38
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/Zcm_016pWeyFr_uIkT7Ki.png)
39
 
40
  ### Key features
 
 
41
  - Instruct model optimized for **hybrid reasoning**
42
  - **Fully open model**: open weights + full training details including public data mixture and training configs
43
+ - **Long context:** Trained on 64k context and suppots up to **128k tokens** using YARN extrapolation
44
+ - **Multilingual**: 6 natively supported (English, French, Spanish, German, Italian, and Portuguese)
45
 
46
  For more details refer to our blog post: TODO
47
 
 
184
  - **Training Framework:** [nanotron](https://github.com/huggingface/nanotron/tree/main)
185
  - **Data processing framework:** [datatrove](https://github.com/huggingface/datatrove)
186
  - **Evaluation framework:** [lighteval](https://github.com/huggingface/lighteval)
187
+ - **Postraining Framework:** [TRL](https://github.com/huggingface/trl)
188
+
189
  ### Open resources
190
  Here is an infographic with all the training details [TODO].
191
  - The datasets used for pretraining can be found in this [collection](https://huggingface.co/collections/HuggingFaceTB/smollm3-pretraining-datasets-685a7353fdc01aecde51b1d9) and those used in mid-training and pos-training can be found here [TODO]