Thank you @Green-Sky ! I'm planning to have a go at the Gemma 4s over the weekend and I'll take your dataset for a spin
Ed Addario PRO
eaddario
AI & ML interests
Finding ways to optimize LLMs' inference performance in resource-constrained environments (e.g. commodity hardware, desktops, laptops, mobiles, edge devices, etc.)
Recent Activity
new activity 2 days ago
eaddario/Qwen3.5-9B-GGUF:Is there a mmproj file? new activity 2 days ago
eaddario/Qwen3.5-9B-GGUF:Amazing quants new activity 3 days ago
eaddario/imatrix-calibration:Great collection, I'm using it for my little project.Organizations
Is there a mmproj file?
2
#1 opened 3 days ago
by
dicksondickson
Amazing quants
2
#2 opened 2 days ago
by
IrisColt
Great collection, I'm using it for my little project.
10
#5 opened about 1 month ago
by
cmh
replied to their post 3 days ago
replied to their post 5 days ago
On this occasion, no difference in size is expected.
I'm benchmarking quality instead of size, and to facilitate apples-to-apples comparisons, models IQ1_M, IQ2_M, Q3_K, Q4_K, Q5_K, Q6_K and Q8_0 were quantized at the same bits-per-weight (bpw) of naive models, and Q4_K-B and Q4_K-U were matched to the ones produced by Bartwoski and Unsloth respectively.
The file sizes are the same, but the quality is better.
You're welcome to the enhanced versions of llama-imatrix and llama-quantize if you require a particular size. If this is not practical, let me know which ones you'd need, and I'll be happy to upload.
posted an update 6 days ago
Post
149
eaddario/imatrix-calibration datasets updated to include Southeast Asian languages (Burmese, Filipino, Indonesian, Thai & Vietnamese).
Request to add SEA language dataset.
4
#4 opened about 1 month ago
by
MagicalAlchemist
posted an update 7 days ago
Post
160
Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.5-4B and Qwen/Qwen3.5-9B
Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.
Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.
Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards
eaddario/Qwen3.5-4B-GGUF
eaddario/Qwen3.5-9B-GGUF
Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.
Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.
Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards
eaddario/Qwen3.5-4B-GGUF
eaddario/Qwen3.5-9B-GGUF
HuggingFaceFW/finepdfs 3T tokens of non-GPT slop
❤️ 1
2
#2 opened 7 months ago
by
Joseph717171
Parquet file is toxic?
1
#3 opened 3 months ago
by
sebastienbo
posted an update 3 months ago
Post
3116
Experimental global target bits‑per‑weight quantization of mistralai/Ministral-3-14B-Instruct-2512 and mistralai/Ministral-3-14B-Reasoning-2512
Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.
Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.
Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards
eaddario/Ministral-3-14B-Instruct-2512-GGUF
eaddario/Ministral-3-14B-Reasoning-2512-GGUF
Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.
Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.
Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards
eaddario/Ministral-3-14B-Instruct-2512-GGUF
eaddario/Ministral-3-14B-Reasoning-2512-GGUF