Error RedHatAI/gemma-3n-E2B-it-quantized.w4a16 is not a multimodal model

#1
by mtobing - opened

I ran this model using vllm and when I tried to extract text from an image in Open WebUI, I got this error

ERROR 10-18 19:36:15 [serving_chat.py:208]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 553, in mm_processor
ERROR 10-18 19:36:15 [serving_chat.py:208]     return self.mm_registry.create_processor(self.model_config)
ERROR 10-18 19:36:15 [serving_chat.py:208]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 10-18 19:36:15 [serving_chat.py:208]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 242, in create_processor
ERROR 10-18 19:36:15 [serving_chat.py:208]     raise ValueError(f"{model_config.model} is not a multimodal model")
ERROR 10-18 19:36:15 [serving_chat.py:208] ValueError: RedHatAI/gemma-3n-E2B-it-quantized.w4a16 is not a multimodal model

I saw Input: Audio-Vision-Text for this model.
Is this multimodal? or did i miss something. I ran vllm with this command

python3 -m vllm.entrypoints.openai.api_server
      --model RedHatAI/gemma-3n-E2B-it-quantized.w4a16
      --trust-remote-code
      --limit-mm-per-prompt.image 1
      --limit-mm-per-prompt.video 1
      --limit-mm-per-prompt.audio 1
      --gpu-memory-utilization 0.85
      --max-model-len 4K
      --max-num-seqs 5
      --max-num-batched-tokens 4K
      --host 0.0.0.0
      --port 8000
      --dtype bfloat16
mtobing changed discussion title from Is this multimodal ? to Error RedHatAI/gemma-3n-E2B-it-quantized.w4a16 is not a multimodal model

Sign up or log in to comment