---
license: apache-2.0
base_model: moonshotai/Kimi-K2-Instruct-0905
tags:
- mlx
- quantized
- kimi
- deepseek-v3
- moe
- instruction-following
- 8-bit
- apple-silicon
model_type: kimi_k2
pipeline_tag: text-generation
language:
- en
- zh
library_name: mlx
---
# 🌙 Kimi K2 Instruct - MLX 8-bit
### State-of-the-Art 671B MoE Model, Optimized for Apple Silicon
[](https://github.com/ml-explore/mlx)
[](https://huggingface.co/richardyoung/Kimi-K2-Instruct-0905-MLX-8bit)
[](https://github.com/ml-explore/mlx)
[](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905)
[](https://opensource.org/licenses/Apache-2.0)
**[Original Model](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905)** | **[MLX Framework](https://github.com/ml-explore/mlx)** | **[More Quantizations](#-other-quantization-options)**
---
## 📖 What is This?
This is a **high-quality 8-bit quantized version** of Kimi K2 Instruct, optimized to run on **Apple Silicon** (M1/M2/M3/M4) Macs using the MLX framework. Think of it as taking a massive 671-billion parameter AI model and compressing it down to ~1 TB while keeping almost all of its intelligence intact!
### ✨ Why You'll Love It
- 🚀 **Massive Context Window** - Handle up to 262,144 tokens (~200,000 words!)
- 🧠 **671B Parameters** - One of the most capable open models available
- ⚡ **Apple Silicon Native** - Fully optimized for M-series chips with Metal acceleration
- 🎯 **8-bit Precision** - Best quality-to-size ratio for serious work
- 🌏 **Bilingual** - Fluent in both English and Chinese
- 💬 **Instruction-Tuned** - Ready for conversations, coding, analysis, and more
## 🎯 Quick Start
### Installation
```bash
pip install mlx-lm
```
### Your First Generation (3 lines of code!)
```python
from mlx_lm import load, generate
model, tokenizer = load("richardyoung/Kimi-K2-Instruct-0905-MLX-8bit")
print(generate(model, tokenizer, prompt="Explain quantum entanglement simply:", max_tokens=200))
```
That's it! 🎉
## 💻 System Requirements
| Component | Minimum | Recommended |
|-----------|---------|-------------|
| **Mac** | M1 or newer | M2 Ultra / M3 Max / M4 Max+ |
| **Memory** | 64 GB unified | 128 GB+ unified |
| **Storage** | 1.1 TB free | Fast SSD (2+ TB) |
| **macOS** | 12.0+ | Latest version |
> ⚠️ **Note:** This is a HUGE model! Make sure you have enough RAM and storage.
## 📚 Usage Examples
### Command Line Interface
```bash
mlx_lm.generate \
--model richardyoung/Kimi-K2-Instruct-0905-MLX-8bit \
--prompt "Write a Python script to analyze CSV files." \
--max-tokens 500
```
### Chat Conversation
```python
from mlx_lm import load, generate
model, tokenizer = load("richardyoung/Kimi-K2-Instruct-0905-MLX-8bit")
conversation = """<|im_start|>system
You are a helpful AI assistant specialized in coding and problem-solving.<|im_end|>
<|im_start|>user
Can you help me optimize this Python code?<|im_end|>
<|im_start|>assistant
"""
response = generate(model, tokenizer, prompt=conversation, max_tokens=500)
print(response)
```
### Advanced: Streaming Output
```python
from mlx_lm import load, generate
model, tokenizer = load("richardyoung/Kimi-K2-Instruct-0905-MLX-8bit")
for token in generate(
model,
tokenizer,
prompt="Tell me about the future of AI:",
max_tokens=500,
stream=True
):
print(token, end="", flush=True)
```
## 🏗️ Architecture Highlights
Click to expand technical details
### Model Specifications
| Feature | Value |
|---------|-------|
| **Total Parameters** | ~671 Billion |
| **Architecture** | DeepSeek V3 (MoE) |
| **Experts** | 384 routed + 1 shared |
| **Active Experts** | 8 per token |
| **Hidden Size** | 7168 |
| **Layers** | 61 |
| **Heads** | 56 |
| **Context Length** | 262,144 tokens |
| **Quantization** | 8.501 bits per weight |
### Advanced Features
- **🎯 YaRN Rope Scaling** - 64x factor for extended context
- **🗜️ KV Compression** - LoRA-based (rank 512)
- **⚡ Query Compression** - Q-LoRA (rank 1536)
- **🧮 MoE Routing** - Top-8 expert selection with sigmoid scoring
- **🔧 FP8 Training** - Pre-quantized with e4m3 precision
## 🎨 Other Quantization Options
Choose the right balance for your needs:
| Quantization | Size | Quality | Speed | Best For |
|--------------|------|---------|-------|----------|
| **8-bit** (you are here) | ~1 TB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | Production, best quality |
| [6-bit](https://huggingface.co/richardyoung/Kimi-K2-Instruct-0905-MLX-6bit) | ~800 GB | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Sweet spot for most users |
| [4-bit](https://huggingface.co/richardyoung/Kimi-K2-Instruct-0905-MLX-4bit) | ~570 GB | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Faster inference |
| [2-bit](https://huggingface.co/richardyoung/Kimi-K2-Instruct-0905-MLX-2bit) | ~320 GB | ⭐⭐ | ⭐⭐⭐⭐⭐ | Experimental |
| [Original](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905) | ~5 TB | ⭐⭐⭐⭐⭐ | ⭐⭐ | Research only |
## 🔧 How It Was Made
This model was quantized using MLX's built-in quantization:
```bash
mlx_lm.convert \
--hf-path moonshotai/Kimi-K2-Instruct-0905 \
--mlx-path Kimi-K2-Instruct-0905-MLX-8bit \
-q --q-bits 8 \
--trust-remote-code
```
**Result:** 8.501 bits per weight (includes metadata overhead)
## ⚡ Performance Tips
Getting the best performance
1. **Close other applications** - Free up as much RAM as possible
2. **Use an external SSD** - If your internal drive is full
3. **Monitor memory** - Watch Activity Monitor during inference
4. **Adjust batch size** - If you get OOM errors, reduce max_tokens
5. **Keep your Mac cool** - Good airflow helps maintain peak performance
## ⚠️ Known Limitations
- 🍎 **Apple Silicon Only** - Won't work on Intel Macs or NVIDIA GPUs
- 💾 **Huge Storage Needs** - Make sure you have 1.1 TB+ free
- 🐏 **RAM Intensive** - Needs 64+ GB unified memory minimum
- 🐌 **Slower on M1** - Best performance on M2 Ultra or newer
- 🌐 **Bilingual Focus** - Optimized for English and Chinese
## 📄 License
Apache 2.0 - Same as the original model. Free for commercial use!
## 🙏 Acknowledgments
- **Original Model:** [Moonshot AI](https://www.moonshot.cn/) for creating Kimi K2
- **Framework:** Apple's [MLX team](https://github.com/ml-explore/mlx) for the amazing framework
- **Inspiration:** DeepSeek V3 architecture
## 📚 Citation
If you use this model in your research or product, please cite:
```bibtex
@misc{kimi-k2-2025,
title={Kimi K2: Advancing Long-Context Language Models},
author={Moonshot AI},
year={2025},
url={https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905}
}
```
## 🔗 Useful Links
- 📦 **Original Model:** [moonshotai/Kimi-K2-Instruct-0905](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905)
- 🛠️ **MLX Framework:** [GitHub](https://github.com/ml-explore/mlx)
- 📖 **MLX LM Docs:** [GitHub](https://github.com/ml-explore/mlx-examples/tree/main/llms)
- 💬 **Discussions:** [Ask questions here!](https://huggingface.co/richardyoung/Kimi-K2-Instruct-0905-MLX-8bit/discussions)
---
**Quantized with ❤️ by [richardyoung](https://deepneuro.ai/richard)**
*If you find this useful, please ⭐ star the repo and share with others!*
**Created:** October 2025 | **Format:** MLX 8-bit