rp-yu
/

Qwen2-VL-2b-VPT-Seg-Alignment

Image-Text-to-Text

text-generation

Model card Files Files and versions

Introducing Visual Perception Token into Multimodal Large Language Model

This repository contains models based on the paper Introducing Visual Perception Token into Multimodal Large Language Model. These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).

Code: https://github.com/yu-rp/VisualPerceptionToken

Downloads last month: 1

Model tree for rp-yu/Qwen2-VL-2b-VPT-Seg-Alignment

Base model

Qwen/Qwen2-VL-2B

Finetuned

Qwen/Qwen2-VL-2B-Instruct

Finetuned

(278)

this model

Dataset used to train rp-yu/Qwen2-VL-2b-VPT-Seg-Alignment

Collection including rp-yu/Qwen2-VL-2b-VPT-Seg-Alignment

VPT Models

Qwen2-VL Models with Visual Perception Token or used in training process. • 7 items • Updated Feb 20