When to transition to int_X quantisation?

#5
by spakment - opened

Hey, thanks so much for this, it is a brilliant way to allow us to create loras on our measily hardware!

I'm running a Blackwell 32G card, and I'm naively thinking that using int4 would allow Qwen Image to fit into 32G Ram and wondering if I should use that or the int3_ARA. I was wondering at what point do you think is it better to transition to the non ARA int quantisation, (eg when the ARA no longer gives a measurable win)?

thanks again!

It will depend on the model. One way to test is to generate samples at different quantization. Some models like Wan seem to handle int4 better than others. For Qwen Image, the image is pretty broken down at int4. I have used int6 with success on Qwen Image, but I believe the 3bit ARA may produce better results with anything less than that, though this is just a guess. I have not run the actual error margin on a layer by layer level.

Sign up or log in to comment