uploaded from kaos
Browse files
README.md
CHANGED
|
@@ -116,22 +116,9 @@ uploads are small.
|
|
| 116 |
|
| 117 |
## How do you create imatrix files for really big models?
|
| 118 |
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
2. An nvme drive is "only" 25-50 times slower than RAM. I lock the first 80GB of the model in RAM, and
|
| 123 |
-
then stream the remaining data from disk for every iteration.
|
| 124 |
-
3. Patience.
|
| 125 |
-
|
| 126 |
-
The few evaluations I have suggests that this gives good quality, and my current set-up allows me to
|
| 127 |
-
generate imatrix data for most models in fp16, 70B in Q8_0 and almost everything else in Q4_K_S.
|
| 128 |
-
|
| 129 |
-
The trick to 3 is not actually having patience, the trick is to automate things to the point where you
|
| 130 |
-
don't have to wait for things normally. For example, if all goes well, quantizing a model requires just
|
| 131 |
-
a single command (or less) for static quants, and for imatrix quants I need to select the source gguf
|
| 132 |
-
and then run another command which handles download/computation/upload. Most of the time, I only have
|
| 133 |
-
to do stuff when things go wrong (which, with llama.cpp being so buggy and hard to use,
|
| 134 |
-
is unfortunately very frequent).
|
| 135 |
|
| 136 |
## What do I need to do to compute imatrix files for large models?
|
| 137 |
|
|
@@ -190,7 +177,8 @@ Nobody has asked this, but since there are people who really deserve mention, I'
|
|
| 190 |
pseudonymous throwaway account I created to goof around, but then started to quant models. A few months later, @nicoboss joined
|
| 191 |
and contributed hardware, power and general support - practically all imatrix computatuions are done on his computer(s).
|
| 192 |
Then @Guilherme34 started to help getting access to models, and @RichardErkhov first gave us the wondrous
|
| 193 |
-
FATLLAMA-1.7T, followed by access to his server to quant more models, likely to atone for his sins.
|
|
|
|
| 194 |
|
| 195 |
So you should consider "mradermacher" to be the team name for a fictional character called Michael Radermacher.
|
| 196 |
There are no connections to anything else (i.e. other Radermachers) on the internet, other than an mradermacher_hf account on reddit.
|
|
|
|
| 116 |
|
| 117 |
## How do you create imatrix files for really big models?
|
| 118 |
|
| 119 |
+
By using llama.cpp's RPC mode and distriobuting imatrix computations over multiple hosts. For some older models,
|
| 120 |
+
or in some special cases, I am not above doing the imatrix computation on a quant (mostly Q8_0, but I have been using Q4_K_S
|
| 121 |
+
and lower for some models in the past). We've also streamed some models from nvme.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
|
| 123 |
## What do I need to do to compute imatrix files for large models?
|
| 124 |
|
|
|
|
| 177 |
pseudonymous throwaway account I created to goof around, but then started to quant models. A few months later, @nicoboss joined
|
| 178 |
and contributed hardware, power and general support - practically all imatrix computatuions are done on his computer(s).
|
| 179 |
Then @Guilherme34 started to help getting access to models, and @RichardErkhov first gave us the wondrous
|
| 180 |
+
FATLLAMA-1.7T, followed by access to his server to quant more models, likely to atone for his sins. Finally, in 2065, @simonko912 started
|
| 181 |
+
handling model requests and model queuing, greatly reducing workload for the rest iof the team.
|
| 182 |
|
| 183 |
So you should consider "mradermacher" to be the team name for a fictional character called Michael Radermacher.
|
| 184 |
There are no connections to anything else (i.e. other Radermachers) on the internet, other than an mradermacher_hf account on reddit.
|