As far as I understand it this is a multi-modal LLM not a multi-modal embedding model?
If my understanding is correct then the tag might want to be updated as it's currently set to Feature Extraction.
· Sign up or log in to comment