--- language: - en - zh license: apache-2.0 tags: - gguf - quantized - moe - mixture-of-experts - cpu-offload - text-generation - deepseek base_model: deepseek-ai/deepseek-moe-16b-base quantized_by: MikeKuykendall pipeline_tag: text-generation --- # DeepSeek-MoE-16B Q2_K with CPU Offloading Q2_K quantization of DeepSeek-MoE-16B with CPU offloading support. Smallest size, maximum VRAM savings. ## Performance | Configuration | VRAM | Saved | Reduction | |--------------|------|-------|-----------| | **All GPU** | 7.28 GB | - | - | | **CPU Offload** | 1.60 GB | 5.68 GB | **78.0%** | **File Size**: 6.3 GB (from 31 GB F16) ## Usage ```bash huggingface-cli download MikeKuykendall/deepseek-moe-16b-q2-k-cpu-offload-gguf shimmy serve --model-dirs ./models --cpu-moe ``` **Links**: [Q4_K_M](../deepseek-moe-16b-q4-k-m-cpu-offload-gguf) | [Q8_0](../deepseek-moe-16b-q8-0-cpu-offload-gguf) License: Apache 2.0