anikifoss
/

DeepSeek-R1-0528-DQ4_K_R4

Text Generation

Model card Files Files and versions

anikifoss commited on Jun 2

Commit

b9da6f9

·

verified ·

1 Parent(s): 470cd34

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -79,7 +79,7 @@ You can try the following to squeeze out more context on your system:
 - Reducing buffers can free up a bit more VRAM at a very minor cost to performance (`-amb 512` and `-b 1024 -ub 1024`)
 - Switching to an IQ quant will save some memory at the cost of performance (*very very roughly* 10% memory savings at the cost of 10% in inference performance)
-## Optimizing For coding
 Smaller quants, like `UD-Q2_K_XL` are much faster when generating tokens, but often produce code that fails to run or contains bugs. Based on empirical observations, coding seems to be strongly affected by the model quantization. So we use larger quantization where it matters to reduce perplexity while remaining within the target system constraints of 24GB-32GB VRAM, 512GB RAM.
 ### Quantization Approach

 - Reducing buffers can free up a bit more VRAM at a very minor cost to performance (`-amb 512` and `-b 1024 -ub 1024`)
 - Switching to an IQ quant will save some memory at the cost of performance (*very very roughly* 10% memory savings at the cost of 10% in inference performance)
+## Optimizing for Coding
 Smaller quants, like `UD-Q2_K_XL` are much faster when generating tokens, but often produce code that fails to run or contains bugs. Based on empirical observations, coding seems to be strongly affected by the model quantization. So we use larger quantization where it matters to reduce perplexity while remaining within the target system constraints of 24GB-32GB VRAM, 512GB RAM.
 ### Quantization Approach