MFM - Multimodal Foundation Models - a LeafInTheTree Collection

LeafInTheTree 's Collections

Speech-2-Speech

MFM - Multimodal Foundation Models

MFM - Multimodal Foundation Models

updated Jan 28

Paused

Featured

102

Idefics3

📊

102

Generate text based on an image and prompt
Running on Zero

162

VideoLLaMA2

🎥

162

Media understanding
Runtime error

54

GroundingDINO ⚔ OWL

🦖

54

Identify objects in images using text queries
Running

83

Paligemma HF

🤗

83

Generate text and segment images using PaliGemma
Paused

Featured

314

PaliGemma Demo

🤲

314

Annotate and describe images with text prompts
Runtime error

Featured

515

Florence2 + SAM2

🔥

515

Segment and caption objects in images and videos
Running on Zero

10

Florence 2 Vision Model V1

💻

10

Analyze images to caption, detect objects, and extract text
Build error

2

Marketing Vision

👁

2
Runtime error

2

Idefics3

📊

2
Running on Zero

10

Theia

⚡

10

Generate detailed image analyses and depth predictions
Runtime error

16

XGen MM

💻

16

Generate detailed descriptions from images and questions
Sleeping

LLaMA 3.1 Vision

🦙
Running on Zero

Featured

80

Chameleon 30b

🔥

80

Generate descriptions and answers about images
Running

Featured

501

InternVL

⚡

501

Interact with a multimodal chatbot that analyzes images and text
Running on Zero

Featured

811

Florence 2

📉

811

Generate captions and analyze images with various tasks
Running on Zero

Featured

224

Phi 3.5 Vision

🔥

224

Generate text from an image and question
Runtime error

Featured

887

MiniGPT-4

🚀

887
Runtime error

40

Mistral Pixtral Demo

👀

40

Chat with Pixtral 12B using Mistral Inference
Runtime error

Featured

324

Ovis1.6 Gemma2 9B

🐑

324

Interact with a chatbot that understands text and images
meta-llama/Llama-Guard-3-11B-Vision

Image-Text-to-Text • 11B • Updated Nov 18, 2024 • 699 • 66
Running

Featured

94

Owlv2

👀

94

State-of-the-art Zero-shot Object Detection
Running on Zero

Featured

390

Llama-Vision-11B

🚀

390

Generate text by uploading images and asking questions
Runtime error

144

SmolVLM

📊

144

Generate text from images and queries
Runtime error

6

GLM-Edge-V-5B Space

📷

6

Generate text responses based on images and chat history
Running on Zero

17

Paligemma2 Detection

😻

17

Paligemma2 Detection with Supervision
Runtime error

40

Florence Llama

💬

40

Generate text responses from images and text input
Runtime error

6

Paligemma2 10b Ft Docci 448

📉

6
Running on Zero

5

VisPer-LM

🔍

5

Visualize image depth, segmentation, and generation
Running on Zero

Featured

2.01k

Chat With Janus-Pro-7B

🌍

2.01k

A unified multimodal understanding and generation model.