leafspark
/

Reflection-Llama-3.1-70B-GGUF

Text Generation

Model card Files Files and versions

leafspark commited on Sep 9, 2024

Commit

ece248e

·

verified ·

1 Parent(s): 92fc6e3

Add imatrix info

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -34,13 +34,13 @@ GGUF quantized models of [mattshumer/ref_70_e3](https://huggingface.co/mattshume
 | Q8_0_L       | ??.?GB | true  | false   |
 | Q8_0         | ??.?GB | true  | false   |
 | Q6_K_L       | ??.?GB | true  | false   |
-| Q6_K         | ??.?GB | true  | false   |
 | Q5_K_L       | 52.6GB | true  | false   |
 | Q5_K_M       | ??.?GB | true  | false   |
 | Q5_K_S       | 48.7GB | false | false   |
 | Q4_K_L       | 45.3GB | false | false   |
 | Q4_K_M       | ??.?GB | false | false   |
-| Q4_K_S       | ??.?GB | false | false   |
 | IQ4_NL       | ??.?GB | false | true    |
 | IQ4_XS       | ??.?GB | false | true    |
 | Q3_K_XL      | 37.2GB | false | false   |
@@ -63,6 +63,10 @@ GGUF quantized models of [mattshumer/ref_70_e3](https://huggingface.co/mattshume
 The `_L` or `_XL` suffix means that the token embeddings and output weight are at fp16 precision.
 ## Benchmarks
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60518f3731c5be7f3dd5ebc3/zNs-ZFs0SbnomH7mikiOU.png)

 | Q8_0_L       | ??.?GB | true  | false   |
 | Q8_0         | ??.?GB | true  | false   |
 | Q6_K_L       | ??.?GB | true  | false   |
+| Q6_K         | 57.9GB | true  | false   |
 | Q5_K_L       | 52.6GB | true  | false   |
 | Q5_K_M       | ??.?GB | true  | false   |
 | Q5_K_S       | 48.7GB | false | false   |
 | Q4_K_L       | 45.3GB | false | false   |
 | Q4_K_M       | ??.?GB | false | false   |
+| Q4_K_S       | 40.3GB | false | false   |
 | IQ4_NL       | ??.?GB | false | true    |
 | IQ4_XS       | ??.?GB | false | true    |
 | Q3_K_XL      | 37.2GB | false | false   |
 The `_L` or `_XL` suffix means that the token embeddings and output weight are at fp16 precision.
+The iMatrix dataset is bartowski's, which you can find here: [calibration_datav3.txt](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8)
+Computation is done on static Q6_K for 125 chunks.
 ## Benchmarks
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60518f3731c5be7f3dd5ebc3/zNs-ZFs0SbnomH7mikiOU.png)