--- language: - multilingual license: apache-2.0 license_name: kwaipilot-license license_link: LICENSE library_name: transformers base_model: - Kwaipilot/KAT-V1-40B ---
| Stage | Core Idea | Key Techniques | Outcome |
|---|---|---|---|
| 1. Pre-training | Inject knowledge while separating “reasoning” from “direct answering”. |
Dual-regime data • Think-off queries labeled via a custom tagging system. • Think-on queries generated by a multi-agent solver. Knowledge Distillation + Multi-Token Prediction for fine-grained utility. |
Base model attains strong factual and reasoning skills without full-scale pre-training costs. |
| 2. Post-training | Make reasoning optional and efficient. |
Cold-start AutoThink — majority vote sets the initial thinking mode. Step-SRPO — intermediate supervision rewards correct mode selection and answer accuracy under that mode. |
Model triggers CoT only when beneficial, reducing token use and speeding inference. |