Qwen3-VL-4B-Thinking

Run Qwen3-VL-4B-Thinking optimized for CPU/GPU with NexaSDK.

Quickstart

  1. Install NexaSDK

  2. Run the model locally with one line of code:

    nexa infer NexaAI/Qwen3-VL-4B-Thinking-GGUF
    

Model Description

Qwen3-VL-4B-Thinking is a 4-billion-parameter multimodal large language model from the Qwen team at Alibaba Cloud.
Part of the Qwen3-VL (Vision-Language) family, it is designed for advanced visual reasoning and chain-of-thought generation across image, text, and video inputs.

Compared to the Instruct variant, the Thinking model emphasizes deeper multi-step reasoning, analysis, and planning. It produces detailed, structured outputs that reflect intermediate reasoning steps, making it well-suited for research, multimodal understanding, and agentic workflows.

Features

  • Vision-Language Understanding: Processes images, text, and videos for joint reasoning tasks.
  • Structured Thinking Mode: Generates intermediate reasoning traces for better transparency and interpretability.
  • High Accuracy on Visual QA: Performs strongly on visual question answering, chart reasoning, and document analysis benchmarks.
  • Multilingual Support: Understands and responds in multiple languages.
  • Optimized for Efficiency: Delivers strong performance at 4B scale for on-device or edge deployment.

Use Cases

  • Multimodal reasoning and visual question answering
  • Scientific and analytical reasoning tasks involving charts, tables, and documents
  • Step-by-step visual explanation or tutoring
  • Research on interpretability and chain-of-thought modeling
  • Integration into agent systems that require structured reasoning

Inputs and Outputs

Input:

  • Text, images, or combined multimodal prompts (e.g., image + question)

Output:

  • Generated text, reasoning traces, or structured responses
  • May include explicit thought steps or structured JSON reasoning sequences

License

Check the official Qwen license for terms of use and redistribution.

Downloads last month
6,961
GGUF
Model size
4B params
Architecture
Qwen3-VL-4B-Thinking
Hardware compatibility
Log In to view the estimation

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for NexaAI/Qwen3-VL-4B-Thinking-GGUF

Quantized
(9)
this model

Collection including NexaAI/Qwen3-VL-4B-Thinking-GGUF