Qwen
/

Qwen3-235B-A22B-Instruct-2507

@@ -317,6 +317,11 @@ We test the model on an 1M version of the [RULER](https://arxiv.org/abs/2404.066
 * All models are evaluated with Dual Chunk Attention enabled.
 * Since the evaluation is time-consuming, we use 260 samples for each length (13 sub-tasks, 20 samples for each).
 ## Best Practices
 To achieve optimal performance, we recommend the following settings:

 * All models are evaluated with Dual Chunk Attention enabled.
 * Since the evaluation is time-consuming, we use 260 samples for each length (13 sub-tasks, 20 samples for each).
+## Fine Tuning
+Qwen 3 is compatible with [TRL](https://github.com/huggingface/trl) and [Tinker](https://thinkingmachines.ai/tinker/).
 ## Best Practices
 To achieve optimal performance, we recommend the following settings: