Writeup

#3
by erichartford - opened

Very interested in the writeup you plan to do.

Thank you for the write-up to date. Can you update the graphs showing KL-Divergence for 4.00 and 5.00 bpw models as well?

I am experimenting with the 2.00bpw model, and will now play with the 3.00bpw model. I have a 6x3090 system, and the speed is absolutely mind blowing for such a big model. I run with tensor parallel 6 as GLM4.6 is evenly divisible by 6 (96 heads).

Owner

Unfortunately, I don't have sufficient GPUs to run these models locally.

Sign up or log in to comment