anikifoss
/

DeepSeek-R1-0528-DQ4_K_R4

Text Generation

Model card Files Files and versions

anikifoss commited on May 31

Commit

bcd921f

·

verified ·

1 Parent(s): af3a922

Update README.md

Files changed (1) hide show

README.md +26 -4

README.md CHANGED Viewed

@@ -6,14 +6,16 @@ base_model: deepseek-ai/DeepSeek-R1-0528
 # Model Card
-Dynamic quantization of DeepSeek-R1-0528 for **ik_llama** fork, optimized to run on 32GB VRAM and 512GB RAM systems while providing the best balance between quality and performance for coding.
 THIS MODEL ONLY RUNS ON THE **IK_LLAMA** FORK!!!
 See [this detailed guide](https://github.com/ikawrakow/ik_llama.cpp/discussions/258) on how to setup an run **ik_llama**.
 ## Run
-Use the following command line to run the model (tweak the command to further customize it to your needs):
 ```
 ./build/bin/llama-server \
     --alias anikifoss/DeepSeek-R1-0528-DQ4_K_R4 \
@@ -21,7 +23,7 @@ Use the following command line to run the model (tweak the command to further cu
     --temp 0.5 --top-k 0 --top-p 1.0 --min-p 0.1 --repeat-penalty 1.0 \
     --ctx-size 75000 \
     -ctk f16 \
-    -mla 2 -fa \
     -amb 1024 \
     -b 2048 -ub 2048 \
     -fmoe \
@@ -33,7 +35,27 @@ Use the following command line to run the model (tweak the command to further cu
     --port 8090
 ```
-Customization:
 - Replace `/mnt/data/Models/anikifoss/DeepSeek-R1-0528-DQ4_K_R4` with the location of the model (where you downloaded it)
 - Adjust `--threads` to the number of physical cores on your system
 - Tweak these to your preference `--temp 0.5 --top-k 0 --top-p 1.0 --min-p 0.1 --repeat-penalty 1.0`

 # Model Card
+Dynamic quantization of DeepSeek-R1-0528 for **ik_llama** fork, optimized to run with 24GB to 32GB VRAM and 512GB RAM systems while providing the best balance between quality and performance for coding.
 THIS MODEL ONLY RUNS ON THE **IK_LLAMA** FORK!!!
 See [this detailed guide](https://github.com/ikawrakow/ik_llama.cpp/discussions/258) on how to setup an run **ik_llama**.
 ## Run
+Use the following command lines to run the model (tweak the command to further customize it to your needs).
+### 32GB VRAM
 ```
 ./build/bin/llama-server \
     --alias anikifoss/DeepSeek-R1-0528-DQ4_K_R4 \
     --temp 0.5 --top-k 0 --top-p 1.0 --min-p 0.1 --repeat-penalty 1.0 \
     --ctx-size 75000 \
     -ctk f16 \
+    -mla 3 -fa \
     -amb 1024 \
     -b 2048 -ub 2048 \
     -fmoe \
     --port 8090
 ```
+### 24GB VRAM
+```
+./build/bin/llama-server \
+    --alias anikifoss/DeepSeek-R1-0528-DQ4_K_R4 \
+    --model /mnt/data/Models/anikifoss/DeepSeek-R1-0528-DQ4_K_R4/DeepSeek-R1-0528-DQ4_K_R4-00001-of-00010.gguf \
+    --temp 0.5 --top-k 0 --top-p 1.0 --min-p 0.1 --repeat-penalty 1.0 \
+    --ctx-size 41000 \
+    -ctk q8_0 \
+    -mla 2 -fa \
+    -amb 512 \
+    -b 1024 -ub 1024 \
+    -fmoe \
+    --n-gpu-layers 99 \
+    --override-tensor exps=CPU,attn_kv_b=CPU \
+    --parallel 1 \
+    --threads 32 \
+    --host 127.0.0.1 \
+    --port 8090
+```
+### Customization
 - Replace `/mnt/data/Models/anikifoss/DeepSeek-R1-0528-DQ4_K_R4` with the location of the model (where you downloaded it)
 - Adjust `--threads` to the number of physical cores on your system
 - Tweak these to your preference `--temp 0.5 --top-k 0 --top-p 1.0 --min-p 0.1 --repeat-penalty 1.0`