Instructions to use KU-MIIL/CRIT-VL-38B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use KU-MIIL/CRIT-VL-38B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="KU-MIIL/CRIT-VL-38B", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("KU-MIIL/CRIT-VL-38B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use KU-MIIL/CRIT-VL-38B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "KU-MIIL/CRIT-VL-38B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KU-MIIL/CRIT-VL-38B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/KU-MIIL/CRIT-VL-38B

SGLang

How to use KU-MIIL/CRIT-VL-38B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "KU-MIIL/CRIT-VL-38B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KU-MIIL/CRIT-VL-38B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "KU-MIIL/CRIT-VL-38B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KU-MIIL/CRIT-VL-38B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use KU-MIIL/CRIT-VL-38B with Docker Model Runner:
```
docker model run hf.co/KU-MIIL/CRIT-VL-38B
```

🚀 CRIT-VL-38B

CRIT-VL-38B is a large-scale Vision-Language Model (VLM) fine-tuned for complex Cross-Modal Multi-Hop Reasoning. This model was trained to effectively connect text context with visual cues across multiple images, addressing the hallucination and grounding issues prevalent in existing VLMs.

This model is the official open-source release accompanying the CVPR 2026 Accepted paper: "CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning".

📊 Model Details

Base Model: InternVL-3.5-Pretrained (38B)
Architecture: Vision-Language Model with merged LoRA weights.
Training Data Recipe: The model was supervised fine-tuned (SFT) using an optimized combination of the following datasets:
- LLaVA-Onevision-Instruct
- CRIT (+ Korean extension)
- R1-Onevision (+ Korean extension)
Training Infrastructure: Trained on an AWS ParallelCluster / Slurm environment utilizing 64x H200 GPUs. Training throughput was highly optimized using DeepSpeed ZeRO Stage 3 and Gradient Checkpointing.

💻 Quick Start

To use CRIT-VL-38B, you will need to allow custom code execution (trust_remote_code=True) as it utilizes the InternVL architecture.

import torch
from transformers import AutoTokenizer, AutoModel

path = "KU-MIIL/CRIT-VL-38B"

tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
# If you have an 80GB VRAM GPU, you can load it in bfloat16. 
# Otherwise, consider using quantization (e.g., load_in_8bit=True).
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True
).eval().cuda()

# Example: Generate a response (Modify the prompt and image structure according to InternVL documentation)
# response = model.chat(tokenizer, pixel_values, question, generation_config)

📖 Citation

If you find this model or the CRIT dataset useful in your research, please consider citing our CVPR 2026 paper:

@inproceedings{crit2026,
  title={CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning},
  author={Junyoung Sung, Seungwoo Lyu, Minjun Kim, Sumin An, Arsha Nagrani, Paul Hongsuck Seo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

🏢 Acknowledgements

This project was conducted by the Multimodal Interactive Intelligence Laboratory (MIIL) at Korea University.

Downloads last month: 33

Safetensors

Model size

38B params

Tensor type

BF16

Model tree for KU-MIIL/CRIT-VL-38B

Base model

OpenGVLab/InternVL3_5-38B-Pretrained

Finetuned

(3)

this model

KU-MIIL
/

CRIT-VL-38B