Replicate perplexity test
Hi, could you tell me where to find your perplexity file and what command you run for your tests. I'd like to check a quant I made against these results.
Yes the basic information is in this old out-dated github discussion on ik_llama.cpp about quant cooking in the perplexity / kld section. It also shows you how to download and check the test corpus text.
https://github.com/ikawrakow/ik_llama.cpp/discussions/434
Here is an example of the llama-perplexity command I used to measure perplexity specifically for this Qwen3-235B-A22B-Thinking-2507 files:
# you can adjust numa stuff or likely remove it
# you can change batch sizes
# you can offload with `-ngl 99 -ot exps=CPU` or whatever
# you can remove the seed as it is not used
# you can change the threads
# do *not* change the context or the exact test corpus
# you can use `-ger` or whatever if newer models support that
# i report my values now with f16 kv cache so don't use `-ctk q8_0 -ctv q8_0` unless you know u want that
numactl -N 1 -m 1 \
./build/bin/llama-perplexity \
-m "$model" \
-f wiki.test.raw \
--seed 1337 \
-fa -fmoe \
--ctx-size 512 \
-ub 4096 -b 4096 \
--numa numactl \
--threads 128 \
--threads-batch 192 \
--no-mmap
Also take comparisons between quants on different rigs with a grain of salt. I try to use perplexity/kld as a way to measure relative quality between a set of quants all made exactly the same on the same rig. But yeah keep me posted how you make out and what recipes you're finding are good for your setup.