Qwen3.5-4B-KIMI-Distill

A 4B parameter reasoning-enhanced model distilled from KIMI-K2.5, fine-tuned from Qwen3.5-4B on 554K high-quality reasoning traces.

Model Highlights

  • Reasoning Enhancement: Trained on chain-of-thought reasoning traces distilled from KIMI-K2.5
  • Multi-Domain Coverage: Coding (60%), Science (15%), Math (10%), Computer Science (5%), Logical Reasoning (5%), Creative Writing (5%)
  • 2B Reasoning Tokens: Extensive training on ~2B tokens of distilled reasoning data
  • Multimodal Capable: Inherits vision-language capabilities from Qwen3.5-4B

Model Description

Property Value
Base Model Qwen3.5-4B
Model Type Causal Language Model with Vision Encoder
Parameters 4B
Languages English, Chinese
License Apache 2.0
Developer Kassadin88

Training Data

This model was fine-tuned on KIMI-K2.5-550000x, a distilled reasoning dataset containing 554,381 high-quality samples with approximately 2B tokens of chain-of-thought reasoning traces.

Dataset Composition

Domain Percentage Description
Coding 60% Web development, Python, C++, Java, JavaScript, C, Ruby, Lua, Rust, C#
Science 15% Physics, Chemistry, Biology (includes 100K PhD-level science problems)
Mathematics 10% Algebra, Calculus, Probability, Number Theory
Computer Science 5% Algorithms, Data Structures, System Design
Logical Reasoning 5% Deductive and inductive reasoning problems
Creative Writing 5% Storytelling, narrative generation

Data Source

  • Distilled from KIMI-K2.5 on high-complexity reasoning tasks
  • Generated using a modified Datagen pipeline
  • Each sample includes detailed chain-of-thought reasoning traces

Benchmark Results

The model inherits strong foundational capabilities from Qwen3.5-4B. Below are the base model's benchmark performances:

Language Benchmarks

Category Benchmark Qwen3.5-4B
Knowledge & STEM
MMLU-Pro79.1
MMLU-Redux88.8
C-Eval85.1
Instruction Following
IFEval89.8
Reasoning & Coding
LiveCodeBench v655.8

Vision Language Benchmarks

Category Benchmark Qwen3.5-4B
STEM & Puzzle
MMMU77.6
Mathvista (mini)85.1
Document Understanding
OCRBench85.0

Note: For complete benchmark results across all categories, please refer to the Qwen3.5-4B model card.

Quick Start

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Kassadin88/Qwen3.5-4B-KIMI-Distill"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "system", "content": "You are a helpful assistant with strong reasoning capabilities."},
    {"role": "user", "content": "Solve this step by step: A train travels 120 km in 2 hours. At the same speed, how long will it take to travel 300 km?"}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Using vLLM (Recommended for Production)

from vllm import LLM, SamplingParams

llm = LLM(
    model="Kassadin88/Qwen3.5-4B-KIMI-Distill",
    trust_remote_code=True,
    dtype="bfloat16"
)

sampling_params = SamplingParams(
    max_tokens=2048
)

outputs = llm.generate(prompts, sampling_params)

Using SGLang

python -m sglang.launch_server \
    --model-path Kassadin88/Qwen3.5-4B-KIMI-Distill \
    --port 8000 \
    --tp-size 1

OpenAI-Compatible API

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY"
)

response = client.chat.completions.create(
    model="Kassadin88/Qwen3.5-4B-KIMI-Distill",
    messages=[
        {"role": "user", "content": "Write a Python function to find the longest palindromic substring."}
    ],
    max_tokens=1024
)
print(response.choices[0].message.content)

Usage Tips

For Mathematical Reasoning

messages = [
    {"role": "user", "content": "Solve: Find all prime numbers p such that p² + 2 is also prime."}
]
# Model will provide step-by-step reasoning with chain-of-thought

For Code Generation

messages = [
    {"role": "user", "content": "Implement a LRU cache in Python with O(1) get and put operations."}
]
# Model will generate well-structured code with explanations

For Scientific Reasoning

messages = [
    {"role": "user", "content": "Explain the mechanism of CRISPR-Cas9 gene editing and its applications."}
]
# Model will provide detailed scientific explanations

Limitations

  • The model is primarily trained on reasoning tasks and may not perform optimally on creative or open-ended conversational tasks
  • May occasionally generate incorrect reasoning steps or conclusions
  • Should not be used for medical, legal, or financial advice without verification
  • Limited to knowledge present in the training data

Citation

@misc{qwen3.5-4b-kimi-distill,
  author = {Kassadin88},
  title = {Qwen3.5-4B-KIMI-Distill: A Reasoning-Enhanced Model Distilled from KIMI-K2.5},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Kassadin88/Qwen3.5-4B-KIMI-Distill}
}

Acknowledgments


Note: This model is intended for research and educational purposes. Please use responsibly.

Downloads last month
16
Safetensors
Model size
504k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kassadin88/Qwen3.5-4B-KIMI-Distill

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(122)
this model