Memory spikes on long context length

#24
by imomayiz - opened

When testing Qwen3-0.6B on context lengths > 8k, I get very high memory spikes , with bsz=1
For eg. a 16k context sample does not fit in 40gb VRAM.
Did anyone have this problem?
image

Sign up or log in to comment