Qwen3-VL-2B-Instruct

Currently, only NexaSDK supports this GGUF.

Quickstart:

  • Download NexaSDK with one click
  • one line of code to run in your terminal:
nexa infer NexaAI/Qwen3-VL-2B-Instruct-GGUF

Model Description

Qwen3-VL-2B-Instruct is a 2-billion-parameter, instruction-tuned vision-language model in the Qwen3-VL family. It’s designed for efficient multimodal understanding and generation—combining strong text skills with image and video perception—making it ideal for edge and on-device deployment. It supports long contexts (up to 256K tokens) and features upgraded architecture for better spatial, visual, and temporal reasoning.

Features

  • Multimodal I/O: Understands images and long videos, performs OCR, and handles mixed image-text prompts.
  • Long-context reasoning: Up to 256K context for books, documents, or extended visual analysis.
  • Spatial & temporal understanding: Improved grounding and temporal event tracking for videos.
  • Agentic capabilities: Recognizes UI elements and reasons about screen layouts for tool use.
  • Lightweight footprint: 2B parameters for efficient inference across CPU, GPU, or NPU.

Use Cases

  • Visual question answering, captioning, and summarization
  • OCR and document understanding (multi-page, multilingual)
  • Video analysis and highlight detection
  • On-device visual assistants and UI automation agents
  • Edge analytics and lightweight IoT vision tasks

Inputs and Outputs

Inputs

  • Text prompts
  • Images (single or multiple)
  • Videos or frame sequences
  • Mixed multimodal chat turns

Outputs

  • Natural language answers, captions, and visual reasoning
  • OCR text and structured visual information

License

This model is released under the Apache 2.0 License.
Please refer to the Hugging Face model card for detailed licensing and usage information.

Downloads last month
3,161
GGUF
Model size
2B params
Architecture
Qwen3-VL-2B-Instruct
Hardware compatibility
Log In to view the estimation

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NexaAI/Qwen3-VL-2B-Instruct-GGUF

Quantized
(9)
this model

Collection including NexaAI/Qwen3-VL-2B-Instruct-GGUF