Instructions to use KU-MIIL/CRIT-VL-38B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use KU-MIIL/CRIT-VL-38B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="KU-MIIL/CRIT-VL-38B", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("KU-MIIL/CRIT-VL-38B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use KU-MIIL/CRIT-VL-38B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "KU-MIIL/CRIT-VL-38B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KU-MIIL/CRIT-VL-38B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/KU-MIIL/CRIT-VL-38B
- SGLang
How to use KU-MIIL/CRIT-VL-38B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "KU-MIIL/CRIT-VL-38B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KU-MIIL/CRIT-VL-38B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "KU-MIIL/CRIT-VL-38B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KU-MIIL/CRIT-VL-38B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use KU-MIIL/CRIT-VL-38B with Docker Model Runner:
docker model run hf.co/KU-MIIL/CRIT-VL-38B
🚀 CRIT-VL-38B
CRIT-VL-38B is a large-scale Vision-Language Model (VLM) fine-tuned for complex Cross-Modal Multi-Hop Reasoning. This model was trained to effectively connect text context with visual cues across multiple images, addressing the hallucination and grounding issues prevalent in existing VLMs.
This model is the official open-source release accompanying the CVPR 2026 Accepted paper: "CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning".
📊 Model Details
- Base Model: InternVL-3.5-Pretrained (38B)
- Architecture: Vision-Language Model with merged LoRA weights.
- Training Data Recipe: The model was supervised fine-tuned (SFT) using an optimized combination of the following datasets:
LLaVA-Onevision-InstructCRIT(+ Korean extension)R1-Onevision(+ Korean extension)
- Training Infrastructure: Trained on an AWS ParallelCluster / Slurm environment utilizing 64x H200 GPUs. Training throughput was highly optimized using DeepSpeed ZeRO Stage 3 and Gradient Checkpointing.
💻 Quick Start
To use CRIT-VL-38B, you will need to allow custom code execution (trust_remote_code=True) as it utilizes the InternVL architecture.
import torch
from transformers import AutoTokenizer, AutoModel
path = "KU-MIIL/CRIT-VL-38B"
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
# If you have an 80GB VRAM GPU, you can load it in bfloat16.
# Otherwise, consider using quantization (e.g., load_in_8bit=True).
model = AutoModel.from_pretrained(
path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).eval().cuda()
# Example: Generate a response (Modify the prompt and image structure according to InternVL documentation)
# response = model.chat(tokenizer, pixel_values, question, generation_config)
📖 Citation
If you find this model or the CRIT dataset useful in your research, please consider citing our CVPR 2026 paper:
@inproceedings{crit2026,
title={CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning},
author={Junyoung Sung, Seungwoo Lyu, Minjun Kim, Sumin An, Arsha Nagrani, Paul Hongsuck Seo},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}
🏢 Acknowledgements
This project was conducted by the Multimodal Interactive Intelligence Laboratory (MIIL) at Korea University.
- Downloads last month
- 33
Model tree for KU-MIIL/CRIT-VL-38B
Base model
OpenGVLab/InternVL3_5-38B-Pretrained