Error RedHatAI/gemma-3n-E2B-it-quantized.w4a16 is not a multimodal model
#1
by
mtobing
- opened
I ran this model using vllm and when I tried to extract text from an image in Open WebUI, I got this error
ERROR 10-18 19:36:15 [serving_chat.py:208] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 553, in mm_processor
ERROR 10-18 19:36:15 [serving_chat.py:208] return self.mm_registry.create_processor(self.model_config)
ERROR 10-18 19:36:15 [serving_chat.py:208] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 10-18 19:36:15 [serving_chat.py:208] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 242, in create_processor
ERROR 10-18 19:36:15 [serving_chat.py:208] raise ValueError(f"{model_config.model} is not a multimodal model")
ERROR 10-18 19:36:15 [serving_chat.py:208] ValueError: RedHatAI/gemma-3n-E2B-it-quantized.w4a16 is not a multimodal model
I saw Input: Audio-Vision-Text for this model.
Is this multimodal? or did i miss something. I ran vllm with this command
python3 -m vllm.entrypoints.openai.api_server
--model RedHatAI/gemma-3n-E2B-it-quantized.w4a16
--trust-remote-code
--limit-mm-per-prompt.image 1
--limit-mm-per-prompt.video 1
--limit-mm-per-prompt.audio 1
--gpu-memory-utilization 0.85
--max-model-len 4K
--max-num-seqs 5
--max-num-batched-tokens 4K
--host 0.0.0.0
--port 8000
--dtype bfloat16
mtobing
changed discussion title from
Is this multimodal ?
to Error RedHatAI/gemma-3n-E2B-it-quantized.w4a16 is not a multimodal model