Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-8B
- Using non-standard (forked) LLaMA C++ branch for quantization.
- Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
- Using dataset sources: text_en, text_ru.
- Using dataset chunks: 750.
- Small set of patches added.
- Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
- Small set of patches added.
Many thanks to Ed Addario for an impressive job.
Quantization comparison
| BPW/TGS | PPL correlation | PPL mean ratio | ΔPPL | Mean KLD | Maximum KLD | 99.9% KLD | Mean Δp | RMS Δp |
|---|---|---|---|---|---|---|---|---|
| 3.50 | 31.88% | 163.312009 ± 2.603029 | 1316751250.467953 ± 22175721.097179 | 0.865687 ± 0.002557 | 40.508530 | 13.199468 | -0.000 ± 0.001 % | 0.240 ± 0.035 % |
| 4.00 | 27.97% | 894.496097 ± 17.658576 | 7248459977.767529 ± 148545165.662543 | 0.462401 ± 0.002284 | 56.080093 | 14.465672 | -0.001 ± 0.000 % | 0.147 ± 0.022 % |
| 4.50 | 29.87% | 536.818391 ± 10.018571 | 4346810434.048196 ± 84695221.429227 | 0.247896 ± 0.001484 | 39.661839 | 8.264471 | 0.000 ± 0.000 % | 0.130 ± 0.018 % |
| 5.00 | 30.26% | 448.634201 ± 8.243512 | 3631418870.079112 ± 69768738.026683 | 0.189924 ± 0.001342 | 36.415863 | 7.533142 | -0.000 ± 0.000 % | 0.097 ± 0.016 % |
| 5.50 | 29.94% | 475.154499 ± 8.855907 | 3846563982.921083 ± 74878878.690613 | 0.175310 ± 0.001361 | 33.947906 | 7.652236 | -0.000 ± 0.000 % | 0.132 ± 0.037 % |
| 6.00 | 30.18% | 535.916733 ± 10.050226 | 4339495766.813949 ± 85010916.130295 | 0.093320 ± 0.000936 | 31.576149 | 5.749511 | -0.000 ± 0.000 % | 0.085 ± 0.019 % |
| 6.50 | 30.33% | 513.696057 ± 9.587249 | 4159231201.265534 ± 81127272.345352 | 0.076551 ± 0.000824 | 33.049152 | 5.351891 | -0.000 ± 0.000 % | 0.049 ± 0.009 % |
| 7.00 | 30.48% | 487.499691 ± 9.042077 | 3946713977.342775 ± 76548947.044238 | 0.069732 ± 0.000789 | 31.959265 | 5.224213 | -0.000 ± 0.000 % | 0.060 ± 0.014 % |
| 7.50 | 30.45% | 485.864997 ± 9.013780 | 3933452574.640460 ± 76305069.637214 | 0.066390 ± 0.000758 | 27.289934 | 5.135751 | -0.000 ± 0.000 % | 0.049 ± 0.009 % |
| 8.00 | 30.56% | 480.323684 ± 8.884633 | 3888498839.452749 ± 75233350.710666 | 0.064214 ± 0.000758 | 22.896826 | 4.984374 | 0.000 ± 0.000 % | 0.045 ± 0.006 % |
| 8.50 | 30.59% | 468.816726 ± 8.658897 | 3795148991.398126 ± 73329353.181311 | 0.063394 ± 0.000767 | 27.460506 | 4.974599 | -0.000 ± 0.000 % | 0.039 ± 0.005 % |
| 9.00 | 30.59% | 472.128288 ± 8.725107 | 3822013941.457990 ± 73888001.523125 | 0.061325 ± 0.000754 | 26.071749 | 4.991961 | 0.000 ± 0.000 % | 0.032 ± 0.004 % |
| 9.50 | 30.57% | 477.493384 ± 8.834961 | 3865538118.737411 ± 74813478.721664 | 0.061779 ± 0.000778 | 28.092499 | 5.049781 | -0.000 ± 0.000 % | 0.038 ± 0.006 % |
| 10.00 | 30.58% | 473.251580 ± 8.749327 | 3831126611.272995 ± 74092083.168597 | 0.060787 ± 0.000750 | 27.365194 | 5.046810 | -0.000 ± 0.000 % | 0.032 ± 0.004 % |
| 10.50 | 30.58% | 473.369865 ± 8.754410 | 3832086195.704186 ± 74134265.313673 | 0.061487 ± 0.000778 | 29.115179 | 5.049273 | -0.000 ± 0.000 % | 0.031 ± 0.005 % |
| 11.00 | 30.58% | 469.947653 ± 8.686996 | 3804323606.512961 ± 73563714.202142 | 0.060947 ± 0.000761 | 26.897139 | 4.949221 | -0.000 ± 0.000 % | 0.032 ± 0.004 % |
| 11.50 | 30.59% | 469.702016 ± 8.680818 | 3802330885.149252 ± 73513517.264363 | 0.060967 ± 0.000756 | 24.905037 | 4.991287 | -0.000 ± 0.000 % | 0.042 ± 0.006 % |
| 12.00 | 30.59% | 469.007636 ± 8.666011 | 3796697743.108781 ± 73388654.821674 | 0.060841 ± 0.000757 | 29.013231 | 4.902389 | -0.000 ± 0.000 % | 0.034 ± 0.004 % |
| 12.50 | 30.60% | 468.247009 ± 8.650971 | 3790527181.157271 ± 73262486.731016 | 0.061428 ± 0.000774 | 26.518728 | 5.096298 | -0.000 ± 0.000 % | 0.039 ± 0.007 % |
| 13.00 | 30.59% | 468.485073 ± 8.656076 | 3792458472.236744 ± 73305035.806184 | 0.060770 ± 0.000756 | 27.815191 | 4.977703 | -0.000 ± 0.000 % | 0.040 ± 0.006 % |
| 13.50 | 30.60% | 468.608802 ± 8.658301 | 3793462215.247329 ± 73324801.786474 | 0.060845 ± 0.000748 | 25.343117 | 5.012136 | -0.000 ± 0.000 % | 0.034 ± 0.006 % |
| 14.00 | 30.59% | 470.353813 ± 8.694041 | 3807618563.064641 ± 73625192.396193 | 0.060969 ± 0.000763 | 27.384163 | 5.017433 | 0.000 ± 0.000 % | 0.033 ± 0.004 % |
| 14.50 | 30.59% | 469.238406 ± 8.669486 | 3798569859.379515 ± 73417558.393644 | 0.060245 ± 0.000763 | 25.959768 | 4.983978 | 0.000 ± 0.000 % | 0.030 ± 0.004 % |
| 15.00 | 30.59% | 470.262969 ± 8.688094 | 3806881593.724875 ± 73576296.537943 | 0.060078 ± 0.000773 | 29.312548 | 5.103179 | 0.000 ± 0.000 % | 0.029 ± 0.004 % |
- Downloads last month
- 12,336
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.