Issues with multiple tool calls in parallel

by GabrielaCats - opened 16 days ago

•

I'm running this drafter in combination with NVFP4 version of Gemma4 from Nvidia (using vLLM, nightly).

If I use opencode as a client and if it tries to read multiple files in parallel it will fail to generate proper tool calls for each file. Logically (at least to me) it feels this would be the issue with vLLM and parser used there, but when tried to do the same without speculative model (drafter) it worked every time.

Reading files sequentially works for both combinations, no problem there. It still feels like an issue is on vllm side, but why would it work without drafter model (a lot slower that feels like seq?)...

EDIT: I found out there are same or similar issues opened at vLLM, like this one for example: https://github.com/vllm-project/vllm/issues/41967 or this one https://github.com/vllm-project/vllm/pull/42006

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment