Error RedHatAI/gemma-3n-E2B-it-quantized.w4a16 is not a multimodal model

by mtobing - opened 25 days ago

25 days ago

•

I ran this model using vllm and when I tried to extract text from an image in Open WebUI, I got this error

ERROR 10-18 19:36:15 [serving_chat.py:208]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 553, in mm_processor
ERROR 10-18 19:36:15 [serving_chat.py:208]     return self.mm_registry.create_processor(self.model_config)
ERROR 10-18 19:36:15 [serving_chat.py:208]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 10-18 19:36:15 [serving_chat.py:208]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 242, in create_processor
ERROR 10-18 19:36:15 [serving_chat.py:208]     raise ValueError(f"{model_config.model} is not a multimodal model")
ERROR 10-18 19:36:15 [serving_chat.py:208] ValueError: RedHatAI/gemma-3n-E2B-it-quantized.w4a16 is not a multimodal model

I saw Input: Audio-Vision-Text for this model.
Is this multimodal? or did i miss something. I ran vllm with this command

python3 -m vllm.entrypoints.openai.api_server
      --model RedHatAI/gemma-3n-E2B-it-quantized.w4a16
      --trust-remote-code
      --limit-mm-per-prompt.image 1
      --limit-mm-per-prompt.video 1
      --limit-mm-per-prompt.audio 1
      --gpu-memory-utilization 0.85
      --max-model-len 4K
      --max-num-seqs 5
      --max-num-batched-tokens 4K
      --host 0.0.0.0
      --port 8000
      --dtype bfloat16

mtobing changed discussion title from Is this multimodal ? to Error RedHatAI/gemma-3n-E2B-it-quantized.w4a16 is not a multimodal model 25 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment