Qwen3.5-4B-KIMI-Distill

A 4B parameter reasoning-enhanced model distilled from KIMI-K2.5, fine-tuned from Qwen3.5-4B on 554K high-quality reasoning traces.

Model Highlights

Reasoning Enhancement: Trained on chain-of-thought reasoning traces distilled from KIMI-K2.5
Multi-Domain Coverage: Coding (60%), Science (15%), Math (10%), Computer Science (5%), Logical Reasoning (5%), Creative Writing (5%)
2B Reasoning Tokens: Extensive training on ~2B tokens of distilled reasoning data
Multimodal Capable: Inherits vision-language capabilities from Qwen3.5-4B

Model Description

Property	Value
Base Model	Qwen3.5-4B
Model Type	Causal Language Model with Vision Encoder
Parameters	4B
Languages	English, Chinese
License	Apache 2.0
Developer	Kassadin88

Training Data

This model was fine-tuned on KIMI-K2.5-550000x, a distilled reasoning dataset containing 554,381 high-quality samples with approximately 2B tokens of chain-of-thought reasoning traces.

Dataset Composition

Domain	Percentage	Description
Coding	60%	Web development, Python, C++, Java, JavaScript, C, Ruby, Lua, Rust, C#
Science	15%	Physics, Chemistry, Biology (includes 100K PhD-level science problems)
Mathematics	10%	Algebra, Calculus, Probability, Number Theory
Computer Science	5%	Algorithms, Data Structures, System Design
Logical Reasoning	5%	Deductive and inductive reasoning problems
Creative Writing	5%	Storytelling, narrative generation

Data Source

Distilled from KIMI-K2.5 on high-complexity reasoning tasks
Generated using a modified Datagen pipeline
Each sample includes detailed chain-of-thought reasoning traces

Benchmark Results

The model inherits strong foundational capabilities from Qwen3.5-4B. Below are the base model's benchmark performances:

Language Benchmarks

Category	Benchmark	Qwen3.5-4B
Knowledge & STEM
	MMLU-Pro	79.1
	MMLU-Redux	88.8
	C-Eval	85.1
Instruction Following
Instruction Following	IFEval	89.8
Reasoning & Coding
Reasoning & Coding	LiveCodeBench v6	55.8

Vision Language Benchmarks

Category	Benchmark	Qwen3.5-4B
STEM & Puzzle
	MMMU	77.6
	Mathvista (mini)	85.1
Document Understanding
Document Understanding	OCRBench	85.0

Note: For complete benchmark results across all categories, please refer to the Qwen3.5-4B model card.

Quick Start

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Kassadin88/Qwen3.5-4B-KIMI-Distill"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "system", "content": "You are a helpful assistant with strong reasoning capabilities."},
    {"role": "user", "content": "Solve this step by step: A train travels 120 km in 2 hours. At the same speed, how long will it take to travel 300 km?"}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Using vLLM (Recommended for Production)

from vllm import LLM, SamplingParams

llm = LLM(
    model="Kassadin88/Qwen3.5-4B-KIMI-Distill",
    trust_remote_code=True,
    dtype="bfloat16"
)

sampling_params = SamplingParams(
    max_tokens=2048
)

outputs = llm.generate(prompts, sampling_params)

Using SGLang

python -m sglang.launch_server \
    --model-path Kassadin88/Qwen3.5-4B-KIMI-Distill \
    --port 8000 \
    --tp-size 1

OpenAI-Compatible API

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY"
)

response = client.chat.completions.create(
    model="Kassadin88/Qwen3.5-4B-KIMI-Distill",
    messages=[
        {"role": "user", "content": "Write a Python function to find the longest palindromic substring."}
    ],
    max_tokens=1024
)
print(response.choices[0].message.content)

Usage Tips

For Mathematical Reasoning

messages = [
    {"role": "user", "content": "Solve: Find all prime numbers p such that p² + 2 is also prime."}
]
# Model will provide step-by-step reasoning with chain-of-thought

For Code Generation

messages = [
    {"role": "user", "content": "Implement a LRU cache in Python with O(1) get and put operations."}
]
# Model will generate well-structured code with explanations

For Scientific Reasoning

messages = [
    {"role": "user", "content": "Explain the mechanism of CRISPR-Cas9 gene editing and its applications."}
]
# Model will provide detailed scientific explanations

Limitations

The model is primarily trained on reasoning tasks and may not perform optimally on creative or open-ended conversational tasks
May occasionally generate incorrect reasoning steps or conclusions
Should not be used for medical, legal, or financial advice without verification
Limited to knowledge present in the training data

Citation

@misc{qwen3.5-4b-kimi-distill,
  author = {Kassadin88},
  title = {Qwen3.5-4B-KIMI-Distill: A Reasoning-Enhanced Model Distilled from KIMI-K2.5},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Kassadin88/Qwen3.5-4B-KIMI-Distill}
}

Acknowledgments

Base Model: Qwen Team for Qwen3.5-4B
Training Data: ianncity for KIMI-K2.5-550000x dataset
Training Framework: MS-Swift

Note: This model is intended for research and educational purposes. Please use responsibly.

Downloads last month: 16

Safetensors

Model size

504k params

Tensor type

BF16

Model tree for Kassadin88/Qwen3.5-4B-KIMI-Distill

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

(122)

this model