I think the dataset is gonna have to be more diverse.

#1
by Lockout - opened

image

image

I get a coding dataset was used but this is pretty catastrophic forgetting. The 218b model forgets how to write english. PPL tests that have been done on the models show that it's really high, in the 20s and after all of this downloading, the quants don't seem to inference much faster than the full model.

The method looks promising but as in other prunes, it may have REAPed too much to be viable.

Sign up or log in to comment