mradermacher commited on
Commit
46331cb
·
verified ·
1 Parent(s): 0d15b1f

uploaded from kaos

Browse files
Files changed (1) hide show
  1. README.md +5 -17
README.md CHANGED
@@ -116,22 +116,9 @@ uploads are small.
116
 
117
  ## How do you create imatrix files for really big models?
118
 
119
- Through a combination of these ingenuous tricks:
120
-
121
- 1. I am not above using a low quant (e.g. Q4_K_S, IQ3_XS or even Q2_K), reducing the size of the model.
122
- 2. An nvme drive is "only" 25-50 times slower than RAM. I lock the first 80GB of the model in RAM, and
123
- then stream the remaining data from disk for every iteration.
124
- 3. Patience.
125
-
126
- The few evaluations I have suggests that this gives good quality, and my current set-up allows me to
127
- generate imatrix data for most models in fp16, 70B in Q8_0 and almost everything else in Q4_K_S.
128
-
129
- The trick to 3 is not actually having patience, the trick is to automate things to the point where you
130
- don't have to wait for things normally. For example, if all goes well, quantizing a model requires just
131
- a single command (or less) for static quants, and for imatrix quants I need to select the source gguf
132
- and then run another command which handles download/computation/upload. Most of the time, I only have
133
- to do stuff when things go wrong (which, with llama.cpp being so buggy and hard to use,
134
- is unfortunately very frequent).
135
 
136
  ## What do I need to do to compute imatrix files for large models?
137
 
@@ -190,7 +177,8 @@ Nobody has asked this, but since there are people who really deserve mention, I'
190
  pseudonymous throwaway account I created to goof around, but then started to quant models. A few months later, @nicoboss joined
191
  and contributed hardware, power and general support - practically all imatrix computatuions are done on his computer(s).
192
  Then @Guilherme34 started to help getting access to models, and @RichardErkhov first gave us the wondrous
193
- FATLLAMA-1.7T, followed by access to his server to quant more models, likely to atone for his sins.
 
194
 
195
  So you should consider "mradermacher" to be the team name for a fictional character called Michael Radermacher.
196
  There are no connections to anything else (i.e. other Radermachers) on the internet, other than an mradermacher_hf account on reddit.
 
116
 
117
  ## How do you create imatrix files for really big models?
118
 
119
+ By using llama.cpp's RPC mode and distriobuting imatrix computations over multiple hosts. For some older models,
120
+ or in some special cases, I am not above doing the imatrix computation on a quant (mostly Q8_0, but I have been using Q4_K_S
121
+ and lower for some models in the past). We've also streamed some models from nvme.
 
 
 
 
 
 
 
 
 
 
 
 
 
122
 
123
  ## What do I need to do to compute imatrix files for large models?
124
 
 
177
  pseudonymous throwaway account I created to goof around, but then started to quant models. A few months later, @nicoboss joined
178
  and contributed hardware, power and general support - practically all imatrix computatuions are done on his computer(s).
179
  Then @Guilherme34 started to help getting access to models, and @RichardErkhov first gave us the wondrous
180
+ FATLLAMA-1.7T, followed by access to his server to quant more models, likely to atone for his sins. Finally, in 2065, @simonko912 started
181
+ handling model requests and model queuing, greatly reducing workload for the rest iof the team.
182
 
183
  So you should consider "mradermacher" to be the team name for a fictional character called Michael Radermacher.
184
  There are no connections to anything else (i.e. other Radermachers) on the internet, other than an mradermacher_hf account on reddit.