Qwen3-VL-2B-Instruct

Note currently only NexaSDK supports this model's GGUF.

Quickstart:

Download NexaSDK with one click
one line of code to run in your terminal:

nexa infer NexaAI/Qwen3-VL-2B-Instruct-GGUF

Model Description

Qwen3-VL-2B-Instruct is a 2-billion-parameter, instruction-tuned vision-language model in the Qwen3-VL family. It’s designed for efficient multimodal understanding and generation—combining strong text skills with image and video perception—making it ideal for edge and on-device deployment. It supports long contexts (up to 256K tokens) and features upgraded architecture for better spatial, visual, and temporal reasoning.

Features

Multimodal I/O: Understands images and long videos, performs OCR, and handles mixed image-text prompts.
Long-context reasoning: Up to 256K context for books, documents, or extended visual analysis.
Spatial & temporal understanding: Improved grounding and temporal event tracking for videos.
Agentic capabilities: Recognizes UI elements and reasons about screen layouts for tool use.
Lightweight footprint: 2B parameters for efficient inference across CPU, GPU, or NPU.

Use Cases

Visual question answering, captioning, and summarization
OCR and document understanding (multi-page, multilingual)
Video analysis and highlight detection
On-device visual assistants and UI automation agents
Edge analytics and lightweight IoT vision tasks

Inputs and Outputs

Inputs

Text prompts
Images (single or multiple)
Videos or frame sequences
Mixed multimodal chat turns

Outputs

Natural language answers, captions, and visual reasoning
OCR text and structured visual information

License

This model is released under the Apache 2.0 License.
Please refer to the Hugging Face model card for detailed licensing and usage information.

Downloads last month: 490

GGUF

Hardware compatibility

4-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NexaAI/Qwen3-VL-2B-Instruct-GGUF

Base model

Qwen/Qwen3-VL-2B-Instruct

Quantized

(38)

this model

Collection including NexaAI/Qwen3-VL-2B-Instruct-GGUF

Qwen3VL

Collection

Nexa AI infra to support Qwen3VL running on GPU/NPU/CPU • 22 items • Updated Nov 25, 2025 • 4