DeepSeek-MoE-16B Q2_K with CPU Offloading
Q2_K quantization of DeepSeek-MoE-16B with CPU offloading support. Smallest size, maximum VRAM savings.
Performance
| Configuration | VRAM | Saved | Reduction |
|---|---|---|---|
| All GPU | 7.28 GB | - | - |
| CPU Offload | 1.60 GB | 5.68 GB | 78.0% |
File Size: 6.3 GB (from 31 GB F16)
Usage
huggingface-cli download MikeKuykendall/deepseek-moe-16b-q2-k-cpu-offload-gguf
shimmy serve --model-dirs ./models --cpu-moe
License: Apache 2.0
- Downloads last month
- 53
Hardware compatibility
Log In
to view the estimation
2-bit
Model tree for MikeKuykendall/deepseek-moe-16b-q2-k-cpu-offload-gguf
Base model
deepseek-ai/deepseek-moe-16b-base