me too, if I could fit it on my GPU. but I highly doubt it's the quantization. 3.5 35B and all sorts of merges of it works perfectly fine at pretty much any quant
i see, in that case if you already tried sampling (repeat penalty 1.05-1.1) and quant seems fine then only other possible fix that worked sometimes for me is a direct system prompt to 'not overthink' and 'allow knock on errors after first solution'
also i think it's a given that you should be running 0.2 temp for 'precise' responses