Personnel report
Thank you for this distilled version !
it's less prone to overthinking than the base model, and it processes thinking in French when prompted in French.
The enhanced thinking process is a significant improvement.
I've noticed one performance decrease in letter counting (which is expected, as this isn't a strength of transformer models—the original model didn't have this issue). This appears to be a new approach to tool calling. The model frequently offers suggestions about what it can do; simply say "yes" and it handles the correct implementation without issues.
This model has tremendous potential for performance improvements!
Hello @Elsephire
We tried our best with this distill to get claude's behavior and chain of thought to be imitated. This in turn, as you have experienced, has led to decreased performance in a wide range of tasks when compared to the base model.
This was the first Qwen3 MoE distill we have done and I'm sure there is definitely room for performance improvements. We have yet to do any official benchmarks/evals to confirm exactly where the model improved and where performance was lost. Expect a v2 of this model or perhaps a Gemini distilled version of this model in the near future where I will be tweaking certain parameters to optimize for qwen3 30b a3b to avoid overfitting and catastrophic forgetting. (the training params were likely too aggressive causing overall performance degradation)
All in all, thanks for showing your appreciation, it really does mean a lot. Will work to get a new version of this model out soon to resolve these issues.