Vintern-1B-v3.5-GGUF ❄️

Multimodal Vietnamese VLM — Quantized for Efficient Inference

This repository hosts the GGUF (quantized) versions of Vintern-1B-v3.5,
an efficient multimodal model fine-tuned from InternVL2.5-1B to excel in Vietnamese OCR, document understanding, and vision–language reasoning.

The GGUF format enables fast inference on CPU and GPU using llama.cpp, llama-cpp-python, or compatible back-ends (e.g., KoboldCPP, LM Studio, Ollama).

🧩 Model Description

Base model: 5CD-AI/Vintern-1B-v3_5
Architecture: Multimodal Vision–Language (Image-Text-to-Text)
Original framework: Transformers / PyTorch
This repo: Pre-converted and quantized .gguf files for llama.cpp
Intended usage: Text extraction, VQA, document OCR, table reading, and Vietnamese–English bilingual reasoning

📦 Included files

File	Type	Description
`Vintern-1B-v3_5-F16.gguf`	F16	Full-precision base model (best quality, used for further quantization)
`Vintern-1B-v3_5-Q2_K.gguf`	Q2_K	2-bit — ultra small, experimental; lowest memory usage
`Vintern-1B-v3_5-Q3_K_L.gguf`	Q3_K_L	3-bit low — very compact, slower reasoning but minimal VRAM
`Vintern-1B-v3_5-Q3_K_M.gguf`	Q3_K_M	3-bit medium — small footprint, fair quality
`Vintern-1B-v3_5-Q4_0.gguf`	Q4_0	4-bit baseline — fast and balanced, good for CPU/GPU
`Vintern-1B-v3_5-Q4_K_M.gguf`	Q4_K_M	4-bit medium — optimal balance between speed and accuracy
`Vintern-1B-v3_5-Q4_K_S.gguf`	Q4_K_S	4-bit small — faster inference, slightly lower precision
`Vintern-1B-v3_5-Q5_0.gguf`	Q5_0	5-bit baseline — higher quality with reasonable speed
`Vintern-1B-v3_5-Q5_K_M.gguf`	Q5_K_M	5-bit medium — very good quality, recommended for GPUs
`Vintern-1B-v3_5-Q6_K.gguf`	Q6_K	6-bit — close to FP16 accuracy, still efficient
`Vintern-1B-v3_5-Q8_0.gguf`	Q8_0	8-bit — nearly lossless, best fidelity for CPU/GPU inference

All GGUF files were generated from the FP16 base model using the official
llama-quantize tool from ggml-org/llama.cpp.

(Exact sizes depend on model export and metadata.)

Downloads last month: 153

GGUF

Model size

0.6B params

Architecture

qwen2

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for 8Opt/Vintern-1B-v3_5-GGUF

Base model

OpenGVLab/InternVL2_5-1B

Finetuned

5CD-AI/Vintern-1B-v3_5

Quantized

(4)

this model