Abnormally poor spelling?

#4
by nawoalanor - opened

I experimented with Q4_K_M quantization from here: mradermacher/XORTRON.CriminalComputing.LARGE.2026.3-i1-GGUF

I used this startup command: llama-server.exe -m "A:\AI\Llama\Models\etc\XORTRON.CriminalComputing.LARGE.2026.3.i1-Q4_K_M.gguf" -ctk q8_0 -ctv q8_0 -ngl 99 -ub 2048 --parallel 1 --alias llama --no-mmap --ctx-size 131072 --port 5001 --flash-attn on --host 0.0.0.0

Based on the UGI benchmark result I was expecting very good (or at least interesting) results but, without exaggerating, I got the worst output quality of any model I've ever tried. Worse than a 4B or 2B running on my phone. The most obvious problem was terrible spelling. Maybe around 1 in 15 words would have an obvious spelling error. The word "amber" became "ambor", etc.

Any idea what I might've done wrong to get such bad results? It's hard to believe it's a problem with the model itself. My parameters were very "normal", the same as I'd use with any other model.

Caused by the --cache-type-k q8_0 and --cache-type-v q8_0 flags??? I'm guessing...

Sign up or log in to comment