DeepSeek-MoE-16B Q2_K with CPU Offloading

Q2_K quantization of DeepSeek-MoE-16B with CPU offloading support. Smallest size, maximum VRAM savings.

Performance

Configuration VRAM Saved Reduction
All GPU 7.28 GB - -
CPU Offload 1.60 GB 5.68 GB 78.0%

File Size: 6.3 GB (from 31 GB F16)

Usage

huggingface-cli download MikeKuykendall/deepseek-moe-16b-q2-k-cpu-offload-gguf
shimmy serve --model-dirs ./models --cpu-moe

Links: Q4_K_M | Q8_0

License: Apache 2.0

Downloads last month
53
GGUF
Model size
16B params
Architecture
deepseek
Hardware compatibility
Log In to view the estimation

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MikeKuykendall/deepseek-moe-16b-q2-k-cpu-offload-gguf

Quantized
(12)
this model