anikifoss commited on
Commit
bcd921f
·
verified ·
1 Parent(s): af3a922

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -4
README.md CHANGED
@@ -6,14 +6,16 @@ base_model: deepseek-ai/DeepSeek-R1-0528
6
 
7
  # Model Card
8
 
9
- Dynamic quantization of DeepSeek-R1-0528 for **ik_llama** fork, optimized to run on 32GB VRAM and 512GB RAM systems while providing the best balance between quality and performance for coding.
10
 
11
  THIS MODEL ONLY RUNS ON THE **IK_LLAMA** FORK!!!
12
 
13
  See [this detailed guide](https://github.com/ikawrakow/ik_llama.cpp/discussions/258) on how to setup an run **ik_llama**.
14
 
15
  ## Run
16
- Use the following command line to run the model (tweak the command to further customize it to your needs):
 
 
17
  ```
18
  ./build/bin/llama-server \
19
  --alias anikifoss/DeepSeek-R1-0528-DQ4_K_R4 \
@@ -21,7 +23,7 @@ Use the following command line to run the model (tweak the command to further cu
21
  --temp 0.5 --top-k 0 --top-p 1.0 --min-p 0.1 --repeat-penalty 1.0 \
22
  --ctx-size 75000 \
23
  -ctk f16 \
24
- -mla 2 -fa \
25
  -amb 1024 \
26
  -b 2048 -ub 2048 \
27
  -fmoe \
@@ -33,7 +35,27 @@ Use the following command line to run the model (tweak the command to further cu
33
  --port 8090
34
  ```
35
 
36
- Customization:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  - Replace `/mnt/data/Models/anikifoss/DeepSeek-R1-0528-DQ4_K_R4` with the location of the model (where you downloaded it)
38
  - Adjust `--threads` to the number of physical cores on your system
39
  - Tweak these to your preference `--temp 0.5 --top-k 0 --top-p 1.0 --min-p 0.1 --repeat-penalty 1.0`
 
6
 
7
  # Model Card
8
 
9
+ Dynamic quantization of DeepSeek-R1-0528 for **ik_llama** fork, optimized to run with 24GB to 32GB VRAM and 512GB RAM systems while providing the best balance between quality and performance for coding.
10
 
11
  THIS MODEL ONLY RUNS ON THE **IK_LLAMA** FORK!!!
12
 
13
  See [this detailed guide](https://github.com/ikawrakow/ik_llama.cpp/discussions/258) on how to setup an run **ik_llama**.
14
 
15
  ## Run
16
+ Use the following command lines to run the model (tweak the command to further customize it to your needs).
17
+
18
+ ### 32GB VRAM
19
  ```
20
  ./build/bin/llama-server \
21
  --alias anikifoss/DeepSeek-R1-0528-DQ4_K_R4 \
 
23
  --temp 0.5 --top-k 0 --top-p 1.0 --min-p 0.1 --repeat-penalty 1.0 \
24
  --ctx-size 75000 \
25
  -ctk f16 \
26
+ -mla 3 -fa \
27
  -amb 1024 \
28
  -b 2048 -ub 2048 \
29
  -fmoe \
 
35
  --port 8090
36
  ```
37
 
38
+ ### 24GB VRAM
39
+ ```
40
+ ./build/bin/llama-server \
41
+ --alias anikifoss/DeepSeek-R1-0528-DQ4_K_R4 \
42
+ --model /mnt/data/Models/anikifoss/DeepSeek-R1-0528-DQ4_K_R4/DeepSeek-R1-0528-DQ4_K_R4-00001-of-00010.gguf \
43
+ --temp 0.5 --top-k 0 --top-p 1.0 --min-p 0.1 --repeat-penalty 1.0 \
44
+ --ctx-size 41000 \
45
+ -ctk q8_0 \
46
+ -mla 2 -fa \
47
+ -amb 512 \
48
+ -b 1024 -ub 1024 \
49
+ -fmoe \
50
+ --n-gpu-layers 99 \
51
+ --override-tensor exps=CPU,attn_kv_b=CPU \
52
+ --parallel 1 \
53
+ --threads 32 \
54
+ --host 127.0.0.1 \
55
+ --port 8090
56
+ ```
57
+
58
+ ### Customization
59
  - Replace `/mnt/data/Models/anikifoss/DeepSeek-R1-0528-DQ4_K_R4` with the location of the model (where you downloaded it)
60
  - Adjust `--threads` to the number of physical cores on your system
61
  - Tweak these to your preference `--temp 0.5 --top-k 0 --top-p 1.0 --min-p 0.1 --repeat-penalty 1.0`