Vintern-1B-v3.5-GGUF ❄️

Multimodal Vietnamese VLM — Quantized for Efficient Inference

This repository hosts the GGUF (quantized) versions of Vintern-1B-v3.5,
an efficient multimodal model fine-tuned from InternVL2.5-1B to excel in Vietnamese OCR, document understanding, and vision–language reasoning.

The GGUF format enables fast inference on CPU and GPU using llama.cpp, llama-cpp-python, or compatible back-ends (e.g., KoboldCPP, LM Studio, Ollama).


🧩 Model Description

  • Base model: 5CD-AI/Vintern-1B-v3_5
  • Architecture: Multimodal Vision–Language (Image-Text-to-Text)
  • Original framework: Transformers / PyTorch
  • This repo: Pre-converted and quantized .gguf files for llama.cpp
  • Intended usage: Text extraction, VQA, document OCR, table reading, and Vietnamese–English bilingual reasoning

📦 Included files

File Type Description
Vintern-1B-v3_5-F16.gguf F16 Full-precision base model (best quality, used for further quantization)
Vintern-1B-v3_5-Q2_K.gguf Q2_K 2-bit — ultra small, experimental; lowest memory usage
Vintern-1B-v3_5-Q3_K_L.gguf Q3_K_L 3-bit low — very compact, slower reasoning but minimal VRAM
Vintern-1B-v3_5-Q3_K_M.gguf Q3_K_M 3-bit medium — small footprint, fair quality
Vintern-1B-v3_5-Q4_0.gguf Q4_0 4-bit baseline — fast and balanced, good for CPU/GPU
Vintern-1B-v3_5-Q4_K_M.gguf Q4_K_M 4-bit medium — optimal balance between speed and accuracy
Vintern-1B-v3_5-Q4_K_S.gguf Q4_K_S 4-bit small — faster inference, slightly lower precision
Vintern-1B-v3_5-Q5_0.gguf Q5_0 5-bit baseline — higher quality with reasonable speed
Vintern-1B-v3_5-Q5_K_M.gguf Q5_K_M 5-bit medium — very good quality, recommended for GPUs
Vintern-1B-v3_5-Q6_K.gguf Q6_K 6-bit — close to FP16 accuracy, still efficient
Vintern-1B-v3_5-Q8_0.gguf Q8_0 8-bit — nearly lossless, best fidelity for CPU/GPU inference

All GGUF files were generated from the FP16 base model using the official
llama-quantize tool from ggml-org/llama.cpp.

(Exact sizes depend on model export and metadata.)

Downloads last month
153
GGUF
Model size
0.6B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 8Opt/Vintern-1B-v3_5-GGUF

Quantized
(4)
this model