Vintern-1B-v3.5-GGUF ❄️
Multimodal Vietnamese VLM — Quantized for Efficient Inference
This repository hosts the GGUF (quantized) versions of Vintern-1B-v3.5,
an efficient multimodal model fine-tuned from InternVL2.5-1B to excel in Vietnamese OCR, document understanding, and vision–language reasoning.
The GGUF format enables fast inference on CPU and GPU using llama.cpp, llama-cpp-python, or compatible back-ends (e.g., KoboldCPP, LM Studio, Ollama).
🧩 Model Description
- Base model: 5CD-AI/Vintern-1B-v3_5
 - Architecture: Multimodal Vision–Language (Image-Text-to-Text)
 - Original framework: Transformers / PyTorch
 - This repo: Pre-converted and quantized 
.gguffiles forllama.cpp - Intended usage: Text extraction, VQA, document OCR, table reading, and Vietnamese–English bilingual reasoning
 
📦 Included files
| File | Type | Description | 
|---|---|---|
Vintern-1B-v3_5-F16.gguf | 
F16 | Full-precision base model (best quality, used for further quantization) | 
Vintern-1B-v3_5-Q2_K.gguf | 
Q2_K | 2-bit — ultra small, experimental; lowest memory usage | 
Vintern-1B-v3_5-Q3_K_L.gguf | 
Q3_K_L | 3-bit low — very compact, slower reasoning but minimal VRAM | 
Vintern-1B-v3_5-Q3_K_M.gguf | 
Q3_K_M | 3-bit medium — small footprint, fair quality | 
Vintern-1B-v3_5-Q4_0.gguf | 
Q4_0 | 4-bit baseline — fast and balanced, good for CPU/GPU | 
Vintern-1B-v3_5-Q4_K_M.gguf | 
Q4_K_M | 4-bit medium — optimal balance between speed and accuracy | 
Vintern-1B-v3_5-Q4_K_S.gguf | 
Q4_K_S | 4-bit small — faster inference, slightly lower precision | 
Vintern-1B-v3_5-Q5_0.gguf | 
Q5_0 | 5-bit baseline — higher quality with reasonable speed | 
Vintern-1B-v3_5-Q5_K_M.gguf | 
Q5_K_M | 5-bit medium — very good quality, recommended for GPUs | 
Vintern-1B-v3_5-Q6_K.gguf | 
Q6_K | 6-bit — close to FP16 accuracy, still efficient | 
Vintern-1B-v3_5-Q8_0.gguf | 
Q8_0 | 8-bit — nearly lossless, best fidelity for CPU/GPU inference | 
All GGUF files were generated from the FP16 base model using the official
llama-quantizetool from ggml-org/llama.cpp.
(Exact sizes depend on model export and metadata.)
- Downloads last month
 - 153
 
							Hardware compatibility
						Log In
								
								to view the estimation
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit