Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
hypotheticalΒ 
posted an update 1 day ago

I'd love to see this treatment on some of the larger models. Been using G4 26B's and used to use 70B models, those squashed down would remove the need to quantize at all. It would even make the 100B+ models workable.

(note, 8Gb VRAM so memory is definitely the bottleneck)

In this post