When testing Qwen3-0.6B on context lengths > 8k, I get very high memory spikes , with bsz=1For eg. a 16k context sample does not fit in 40gb VRAM.Did anyone have this problem?
· Sign up or log in to comment