nexaml's picture
Update README.md
ae1c248 verified
---
base_model:
- Qwen/Qwen3-VL-2B-Instruct
tags:
- GGUF
---
# Qwen3-VL-2B-Instruct
> [!NOTE]
> Note currently only [NexaSDK](https://github.com/NexaAI/nexa-sdk) supports this model's GGUF.
## Quickstart:
- Download [NexaSDK](https://github.com/NexaAI/nexa-sdk) with one click
- one line of code to run in your terminal:
```
nexa infer NexaAI/Qwen3-VL-2B-Instruct-GGUF
```
## Model Description
**Qwen3-VL-2B-Instruct** is a 2-billion-parameter, instruction-tuned vision-language model in the Qwen3-VL family. It’s designed for efficient multimodal understanding and generation—combining strong text skills with image and video perception—making it ideal for edge and on-device deployment. It supports long contexts (up to 256K tokens) and features upgraded architecture for better spatial, visual, and temporal reasoning.
## Features
- **Multimodal I/O**: Understands images and long videos, performs OCR, and handles mixed image-text prompts.
- **Long-context reasoning**: Up to 256K context for books, documents, or extended visual analysis.
- **Spatial & temporal understanding**: Improved grounding and temporal event tracking for videos.
- **Agentic capabilities**: Recognizes UI elements and reasons about screen layouts for tool use.
- **Lightweight footprint**: 2B parameters for efficient inference across CPU, GPU, or NPU.
## Use Cases
- Visual question answering, captioning, and summarization
- OCR and document understanding (multi-page, multilingual)
- Video analysis and highlight detection
- On-device visual assistants and UI automation agents
- Edge analytics and lightweight IoT vision tasks
## Inputs and Outputs
**Inputs**
- Text prompts
- Images (single or multiple)
- Videos or frame sequences
- Mixed multimodal chat turns
**Outputs**
- Natural language answers, captions, and visual reasoning
- OCR text and structured visual information
## License
This model is released under the **Apache 2.0 License**.
Please refer to the Hugging Face model card for detailed licensing and usage information.