Cannot summarize 8000 tokens

#22
by kalle07 - opened

Cannot summarize 8000 tokens with output 1200 token ..
500 token output after that it repeats itself

Need more info about your machine, setup, and config.

I just noticed that the llama.cpp version for GGUF_Q8 is crucial... the model works reasonably well, but there is no “thinking process,” for example.
so i use different pre-wheels ... if think part works sometimes its overthinking endless ... i can not really instruct how many token/words or character should be summarize its everytime a coincidence ~10% of my 8000token input, if ist more than ~1000t it repeats itself ...
and in most cases after the summary comes a again short summary... but this is an issue on most models...
i tried with and without system-prompt, tried with "create an overview..." "create a summary..." also with or without eg: max words 1000 ...
nothing is consistant ...

grafik

kalle07 changed discussion status to closed
kalle07 changed discussion status to open

Sign up or log in to comment