Adding mention of Tinker and TRL support
Browse files
README.md
CHANGED
|
@@ -317,6 +317,11 @@ We test the model on an 1M version of the [RULER](https://arxiv.org/abs/2404.066
|
|
| 317 |
* All models are evaluated with Dual Chunk Attention enabled.
|
| 318 |
* Since the evaluation is time-consuming, we use 260 samples for each length (13 sub-tasks, 20 samples for each).
|
| 319 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 320 |
## Best Practices
|
| 321 |
|
| 322 |
To achieve optimal performance, we recommend the following settings:
|
|
|
|
| 317 |
* All models are evaluated with Dual Chunk Attention enabled.
|
| 318 |
* Since the evaluation is time-consuming, we use 260 samples for each length (13 sub-tasks, 20 samples for each).
|
| 319 |
|
| 320 |
+
## Fine Tuning
|
| 321 |
+
|
| 322 |
+
Qwen 3 is compatible with [TRL](https://github.com/huggingface/trl) and [Tinker](https://thinkingmachines.ai/tinker/).
|
| 323 |
+
|
| 324 |
+
|
| 325 |
## Best Practices
|
| 326 |
|
| 327 |
To achieve optimal performance, we recommend the following settings:
|