Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-8B

  • Using non-standard (forked) LLaMA C++ branch for quantization.
  • Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
  • Using dataset sources: text_en, text_ru.
  • Using dataset chunks: 750.
  • Small set of patches added.
  • Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
  • Small set of patches added.

Many thanks to Ed Addario for an impressive job.

Quantization comparison

BPW/TGS PPL correlation PPL mean ratio ΔPPL Mean KLD Maximum KLD 99.9% KLD Mean Δp RMS Δp
3.50 31.88% 163.312009 ± 2.603029 1316751250.467953 ± 22175721.097179 0.865687 ± 0.002557 40.508530 13.199468 -0.000 ± 0.001 % 0.240 ± 0.035 %
4.00 27.97% 894.496097 ± 17.658576 7248459977.767529 ± 148545165.662543 0.462401 ± 0.002284 56.080093 14.465672 -0.001 ± 0.000 % 0.147 ± 0.022 %
4.50 29.87% 536.818391 ± 10.018571 4346810434.048196 ± 84695221.429227 0.247896 ± 0.001484 39.661839 8.264471 0.000 ± 0.000 % 0.130 ± 0.018 %
5.00 30.26% 448.634201 ± 8.243512 3631418870.079112 ± 69768738.026683 0.189924 ± 0.001342 36.415863 7.533142 -0.000 ± 0.000 % 0.097 ± 0.016 %
5.50 29.94% 475.154499 ± 8.855907 3846563982.921083 ± 74878878.690613 0.175310 ± 0.001361 33.947906 7.652236 -0.000 ± 0.000 % 0.132 ± 0.037 %
6.00 30.18% 535.916733 ± 10.050226 4339495766.813949 ± 85010916.130295 0.093320 ± 0.000936 31.576149 5.749511 -0.000 ± 0.000 % 0.085 ± 0.019 %
6.50 30.33% 513.696057 ± 9.587249 4159231201.265534 ± 81127272.345352 0.076551 ± 0.000824 33.049152 5.351891 -0.000 ± 0.000 % 0.049 ± 0.009 %
7.00 30.48% 487.499691 ± 9.042077 3946713977.342775 ± 76548947.044238 0.069732 ± 0.000789 31.959265 5.224213 -0.000 ± 0.000 % 0.060 ± 0.014 %
7.50 30.45% 485.864997 ± 9.013780 3933452574.640460 ± 76305069.637214 0.066390 ± 0.000758 27.289934 5.135751 -0.000 ± 0.000 % 0.049 ± 0.009 %
8.00 30.56% 480.323684 ± 8.884633 3888498839.452749 ± 75233350.710666 0.064214 ± 0.000758 22.896826 4.984374 0.000 ± 0.000 % 0.045 ± 0.006 %
8.50 30.59% 468.816726 ± 8.658897 3795148991.398126 ± 73329353.181311 0.063394 ± 0.000767 27.460506 4.974599 -0.000 ± 0.000 % 0.039 ± 0.005 %
9.00 30.59% 472.128288 ± 8.725107 3822013941.457990 ± 73888001.523125 0.061325 ± 0.000754 26.071749 4.991961 0.000 ± 0.000 % 0.032 ± 0.004 %
9.50 30.57% 477.493384 ± 8.834961 3865538118.737411 ± 74813478.721664 0.061779 ± 0.000778 28.092499 5.049781 -0.000 ± 0.000 % 0.038 ± 0.006 %
10.00 30.58% 473.251580 ± 8.749327 3831126611.272995 ± 74092083.168597 0.060787 ± 0.000750 27.365194 5.046810 -0.000 ± 0.000 % 0.032 ± 0.004 %
10.50 30.58% 473.369865 ± 8.754410 3832086195.704186 ± 74134265.313673 0.061487 ± 0.000778 29.115179 5.049273 -0.000 ± 0.000 % 0.031 ± 0.005 %
11.00 30.58% 469.947653 ± 8.686996 3804323606.512961 ± 73563714.202142 0.060947 ± 0.000761 26.897139 4.949221 -0.000 ± 0.000 % 0.032 ± 0.004 %
11.50 30.59% 469.702016 ± 8.680818 3802330885.149252 ± 73513517.264363 0.060967 ± 0.000756 24.905037 4.991287 -0.000 ± 0.000 % 0.042 ± 0.006 %
12.00 30.59% 469.007636 ± 8.666011 3796697743.108781 ± 73388654.821674 0.060841 ± 0.000757 29.013231 4.902389 -0.000 ± 0.000 % 0.034 ± 0.004 %
12.50 30.60% 468.247009 ± 8.650971 3790527181.157271 ± 73262486.731016 0.061428 ± 0.000774 26.518728 5.096298 -0.000 ± 0.000 % 0.039 ± 0.007 %
13.00 30.59% 468.485073 ± 8.656076 3792458472.236744 ± 73305035.806184 0.060770 ± 0.000756 27.815191 4.977703 -0.000 ± 0.000 % 0.040 ± 0.006 %
13.50 30.60% 468.608802 ± 8.658301 3793462215.247329 ± 73324801.786474 0.060845 ± 0.000748 25.343117 5.012136 -0.000 ± 0.000 % 0.034 ± 0.006 %
14.00 30.59% 470.353813 ± 8.694041 3807618563.064641 ± 73625192.396193 0.060969 ± 0.000763 27.384163 5.017433 0.000 ± 0.000 % 0.033 ± 0.004 %
14.50 30.59% 469.238406 ± 8.669486 3798569859.379515 ± 73417558.393644 0.060245 ± 0.000763 25.959768 4.983978 0.000 ± 0.000 % 0.030 ± 0.004 %
15.00 30.59% 470.262969 ± 8.688094 3806881593.724875 ± 73576296.537943 0.060078 ± 0.000773 29.312548 5.103179 0.000 ± 0.000 % 0.029 ± 0.004 %
Downloads last month
12,336
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ENOSYS/Octen-Embedding-8B-750-v1-GGUF

Quantized
(6)
this model

Dataset used to train ENOSYS/Octen-Embedding-8B-750-v1-GGUF