---
language:
- en
- zh
license: apache-2.0
tags:
- gguf
- quantized
- moe
- mixture-of-experts
- cpu-offload
- text-generation
- deepseek
base_model: deepseek-ai/deepseek-moe-16b-base
quantized_by: MikeKuykendall
pipeline_tag: text-generation
---

# DeepSeek-MoE-16B Q2_K with CPU Offloading

Q2_K quantization of DeepSeek-MoE-16B with CPU offloading support. Smallest size, maximum VRAM savings.

## Performance

| Configuration | VRAM | Saved | Reduction |
|--------------|------|-------|-----------|
| **All GPU** | 7.28 GB | - | - |
| **CPU Offload** | 1.60 GB | 5.68 GB | **78.0%** |

**File Size**: 6.3 GB (from 31 GB F16)

## Usage

```bash
huggingface-cli download MikeKuykendall/deepseek-moe-16b-q2-k-cpu-offload-gguf
shimmy serve --model-dirs ./models --cpu-moe
```

**Links**: [Q4_K_M](../deepseek-moe-16b-q4-k-m-cpu-offload-gguf) | [Q8_0](../deepseek-moe-16b-q8-0-cpu-offload-gguf)

License: Apache 2.0