Writeup
#3
by
erichartford
- opened
Very interested in the writeup you plan to do.
Thank you for the write-up to date. Can you update the graphs showing KL-Divergence for 4.00 and 5.00 bpw models as well?
I am experimenting with the 2.00bpw model, and will now play with the 3.00bpw model. I have a 6x3090 system, and the speed is absolutely mind blowing for such a big model. I run with tensor parallel 6 as GLM4.6 is evenly divisible by 6 (96 heads).
Unfortunately, I don't have sufficient GPUs to run these models locally.