Kimi-K2-Instruct-0905 MLX 8-bit
MLX 8-bit quantized version of moonshotai/Kimi-K2-Instruct-0905, a state-of-the-art instruction-following language model based on DeepSeek V3 architecture.
Model Details
Architecture: DeepSeek V3 (Kimi K2)
- Parameters: ~671B total (Mixture of Experts)
- 384 routed experts
- 8 experts per token
- 1 shared expert
- Hidden Size: 7168
- Layers: 61
- Context Length: 262,144 tokens
- Quantization: MLX 8-bit (8.501 bits per weight)
- Size: 1.0 TB
- Original Model: moonshotai/Kimi-K2-Instruct-0905
Features
- Long context support (262K tokens)
- Advanced Mixture of Experts (MoE) architecture with 384 experts
- Optimized for Apple Silicon with MLX framework
- High-quality 8-bit quantization maintains excellent performance
- Instruction-following and multi-turn conversation capabilities
- Native Metal acceleration on M1/M2/M3/M4 Macs
Installation
pip install mlx-lm
Usage
Python API
from mlx_lm import load, generate
# Load the model
model, tokenizer = load("richardyoung/Kimi-K2-Instruct-0905-MLX-8bit")
# Generate text
prompt = "Explain quantum computing in simple terms."
response = generate(model, tokenizer, prompt=prompt, max_tokens=500)
print(response)
Command Line
mlx_lm.generate \
--model richardyoung/Kimi-K2-Instruct-0905-MLX-8bit \
--prompt "Write a Python function to calculate Fibonacci numbers." \
--max-tokens 500
Chat Format
The model uses the ChatML format:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{user message}<|im_end|>
<|im_start|>assistant
{assistant response}<|im_end|>
Multi-turn Conversation Example
from mlx_lm import load, generate
model, tokenizer = load("richardyoung/Kimi-K2-Instruct-0905-MLX-8bit")
conversation = """<|im_start|>system
You are a helpful coding assistant.<|im_end|>
<|im_start|>user
Write a Python function to reverse a string.<|im_end|>
<|im_start|>assistant
"""
response = generate(model, tokenizer, prompt=conversation, max_tokens=300)
print(response)
System Requirements
Minimum:
- 1.1 TB free disk space
- 64 GB RAM (unified memory)
- Apple Silicon Mac (M1 or later)
- macOS 12.0 or later
Recommended:
- 128 GB+ unified memory
- M2 Ultra, M3 Max, or M4 Max/Ultra
- Fast SSD storage
Performance Notes
- Memory Usage: ~1 TB model size + ~20-40 GB runtime overhead
- Inference Speed: Depends on hardware (faster on M2 Ultra/M3 Max)
- Quantization: 8-bit quantization maintains near-original model quality
- MoE Efficiency: Only 8 experts activated per token (not all 384)
Model Variants
If you need different quantization levels or formats:
- MLX 6-bit (coming soon):
richardyoung/Kimi-K2-Instruct-0905-MLX-6bit - MLX 4-bit (coming soon):
richardyoung/Kimi-K2-Instruct-0905-MLX-4bit - Original Model: moonshotai/Kimi-K2-Instruct-0905
Limitations
- Requires Apple Silicon (not compatible with x86/CUDA)
- Very large model size (1 TB) requires significant storage
- High memory requirements (64+ GB unified memory)
- Inference speed depends heavily on available RAM and SSD speed
- Chinese-English bilingual model, optimized for both languages
Technical Details
Quantization Method
This model was quantized using MLX's built-in quantization:
mlx_lm.convert \
--hf-path moonshotai/Kimi-K2-Instruct-0905 \
--mlx-path Kimi-K2-Instruct-0905-MLX-8bit \
-q --q-bits 8 --trust-remote-code
Result: 8.501 bits per weight (slightly higher than 8-bit due to metadata)
Architecture Highlights
- Rope Scaling: YaRN with 64x factor for extended context
- KV Compression: LoRA-based key-value compression (rank 512)
- Query Compression: Q-LoRA rank 1536
- MoE Routing: Top-8 expert selection with sigmoid scoring
- Training: Pre-quantized with FP8 (e4m3) in base model
Citation
If you use this model, please cite the original Kimi K2 work:
@misc{kimi-k2-2025,
title={Kimi K2: Advancing Long-Context Language Models},
author={Moonshot AI},
year={2025},
url={https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905}
}
License
Same as base model: Apache 2.0
Links
- Original Model: moonshotai/Kimi-K2-Instruct-0905
- MLX Framework: GitHub
- MLX LM: GitHub
Quantized by: richardyoung Format: MLX 8-bit Created: 2025-10-25
- Downloads last month
- -
Model tree for richardyoung/Kimi-K2-Instruct-0905-MLX-8bit
Base model
moonshotai/Kimi-K2-Instruct-0905