anikifoss commited on
Commit
8dac81a
·
verified ·
1 Parent(s): c2aa04f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -87,7 +87,7 @@ You can try the following to squeeze out more context on your system:
87
  Generally, imatrix is not recommended for Q4 and larger quants. The problem with imatrix is that it will guide what model remembers, while anything not covered by the text sample used to generate the imartrix is more likely to be forgotten. For example, an imatrix derived from wikipedia sample is likely to negatively affect tasks like coding. In other words, while imatrix can improve specific benchmarks, that are similar to the imatrix input sample, it will also skew the model performance towards tasks similar to the imatrix sample at the expense of other tasks.
88
 
89
  ## Benchmarks
90
- Smaller quants, like `UD-Q2_K_XL` are much faster when generating tokens, but often produce code that fails to run or contains bugs. Based on empirical observations, coding seems to be strongly affected by the model quantization. So we use larger quantization where it matter to reduce perplexity while remaining within the target system constraints of 24GB-32GB VRAM, 512GB RAM.
91
 
92
  **System:** Threadripper Pro 7975WX, 768GB DDR5@5600MHz, RTX 5090 32GB
93
 
 
87
  Generally, imatrix is not recommended for Q4 and larger quants. The problem with imatrix is that it will guide what model remembers, while anything not covered by the text sample used to generate the imartrix is more likely to be forgotten. For example, an imatrix derived from wikipedia sample is likely to negatively affect tasks like coding. In other words, while imatrix can improve specific benchmarks, that are similar to the imatrix input sample, it will also skew the model performance towards tasks similar to the imatrix sample at the expense of other tasks.
88
 
89
  ## Benchmarks
90
+ Smaller quants, like `UD-Q2_K_XL` are much faster when generating tokens, but often produce code that fails to run or contains bugs. Based on empirical observations, coding seems to be strongly affected by the model quantization. So we use larger quantization where it matters to reduce perplexity while remaining within the target system constraints of 24GB-32GB VRAM, 512GB RAM.
91
 
92
  **System:** Threadripper Pro 7975WX, 768GB DDR5@5600MHz, RTX 5090 32GB
93