DeepSeek-MoE-16B Q2_K with CPU Offloading

Q2_K quantization of DeepSeek-MoE-16B with CPU offloading support. Smallest size, maximum VRAM savings.

Performance

Configuration	VRAM	Saved	Reduction
All GPU	7.28 GB	-	-
CPU Offload	1.60 GB	5.68 GB	78.0%

File Size: 6.3 GB (from 31 GB F16)

huggingface-cli download MikeKuykendall/deepseek-moe-16b-q2-k-cpu-offload-gguf
shimmy serve --model-dirs ./models --cpu-moe

Links: Q4_K_M | Q8_0

License: Apache 2.0

GGUF

Model size

16B params

Architecture

deepseek

Hardware compatibility

2-bit

Base model

Quantized

(12)

this model