I think the dataset is gonna have to be more diverse.

by Lockout - opened 26 days ago

26 days ago

I get a coding dataset was used but this is pretty catastrophic forgetting. The 218b model forgets how to write english. PPL tests that have been done on the models show that it's really high, in the 20s and after all of this downloading, the quants don't seem to inference much faster than the full model.

The method looks promising but as in other prunes, it may have REAPed too much to be viable.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment