mradermacher
/

model_requests

English

Model card Files Files and versions

xet

Community

2495

mradermacher commited on 18 days ago

Commit

46331cb

verified ·

1 Parent(s): 0d15b1f

uploaded from kaos

Browse files

Files changed (1) hide show

README.md +5 -17

README.md CHANGED Viewed

@@ -116,22 +116,9 @@ uploads are small.
 ## How do you create imatrix files for really big models?
-Through a combination of these ingenuous tricks:
-1. I am not above using a low quant (e.g. Q4_K_S, IQ3_XS or even Q2_K), reducing the size of the model.
-2. An nvme drive is "only" 25-50 times slower than RAM. I lock the first 80GB of the model in RAM, and
-   then stream the remaining data from disk for every iteration.
-3. Patience.
-The few evaluations I have suggests that this gives good quality, and my current set-up allows me to
-generate imatrix data for most models in fp16, 70B in Q8_0 and almost everything else in Q4_K_S.
-The trick to 3 is not actually having patience, the trick is to automate things to the point where you
-don't have to wait for things normally. For example, if all goes well, quantizing a model requires just
-a single command (or less) for static quants, and for imatrix quants I need to select the source gguf
-and then run another command which handles download/computation/upload. Most of the time, I only have
-to do stuff when things go wrong (which, with llama.cpp being so buggy and hard to use,
-is unfortunately very frequent).
 ## What do I need to do to compute imatrix files for large models?
@@ -190,7 +177,8 @@ Nobody has asked this, but since there are people who really deserve mention, I'
 pseudonymous throwaway account I created to goof around, but then started to quant models. A few months later, @nicoboss joined
 and contributed hardware, power and general support - practically all imatrix computatuions are done on his computer(s).
 Then @Guilherme34 started to help getting access to models, and @RichardErkhov first gave us the wondrous
-FATLLAMA-1.7T, followed by access to his server to quant more models, likely to atone for his sins.
 So you should consider "mradermacher" to be the team name for a fictional character called Michael Radermacher.
 There are no connections to anything else (i.e. other Radermachers) on the internet, other than an mradermacher_hf account on reddit.

 ## How do you create imatrix files for really big models?
+By using llama.cpp's RPC mode and distriobuting imatrix computations over multiple hosts. For some older models,
+or in some special cases, I am not above doing the imatrix computation on a quant (mostly Q8_0, but I have been using Q4_K_S
+and lower for some models in the past). We've also streamed some models from nvme.
 ## What do I need to do to compute imatrix files for large models?
 pseudonymous throwaway account I created to goof around, but then started to quant models. A few months later, @nicoboss joined
 and contributed hardware, power and general support - practically all imatrix computatuions are done on his computer(s).
 Then @Guilherme34 started to help getting access to models, and @RichardErkhov first gave us the wondrous
+FATLLAMA-1.7T, followed by access to his server to quant more models, likely to atone for his sins. Finally, in 2065, @simonko912 started
+handling model requests and model queuing, greatly reducing workload for the rest iof the team.
 So you should consider "mradermacher" to be the team name for a fictional character called Michael Radermacher.
 There are no connections to anything else (i.e. other Radermachers) on the internet, other than an mradermacher_hf account on reddit.