Replicate perplexity test

by sm54 - opened 1 day ago

sm54

1 day ago

Hi, could you tell me where to find your perplexity file and what command you run for your tests. I'd like to check a quant I made against these results.

ubergarm

Owner about 16 hours ago

•

edited about 16 hours ago

@sm54

Yes the basic information is in this old out-dated github discussion on ik_llama.cpp about quant cooking in the perplexity / kld section. It also shows you how to download and check the test corpus text.

https://github.com/ikawrakow/ik_llama.cpp/discussions/434

Here is an example of the llama-perplexity command I used to measure perplexity specifically for this Qwen3-235B-A22B-Thinking-2507 files:

# you can adjust numa stuff or likely remove it
# you can change batch sizes
# you can offload with `-ngl 99 -ot exps=CPU` or whatever
# you can remove the seed as it is not used
# you can change the threads
# do *not* change the context or the exact test corpus
# you can use `-ger` or whatever if newer models support that
# i report my values now with f16 kv cache so don't use `-ctk q8_0 -ctv q8_0` unless you know u want that

numactl -N 1 -m 1 \
./build/bin/llama-perplexity \
    -m "$model" \
    -f wiki.test.raw \
    --seed 1337 \
    -fa -fmoe \
    --ctx-size 512 \
    -ub 4096 -b 4096 \
    --numa numactl \
    --threads 128 \
    --threads-batch 192 \
    --no-mmap

Also take comparisons between quants on different rigs with a grain of salt. I try to use perplexity/kld as a way to measure relative quality between a set of quants all made exactly the same on the same rig. But yeah keep me posted how you make out and what recipes you're finding are good for your setup.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment