MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Mingkai Jia^1,2, Wei Yin^2*§, Xiaotao Hu^1,2, Jiaxin Guo³, Xiaoyang Guo²
Qian Zhang², Xiao-Xiao Long⁴, Ping Tan¹

HKUST¹, Horizon Robotics², CUHK³, NJU⁴
^* Corresponding Author, ^§ Project Leader

🚀News

[August 2025] Achieve SOTA at paperwithcode leaderboards: Image Reconstruction on ImageNet and UHDBench.
[August 2025] Released Inference Code
[August 2025] Released model zoo.
[August 2025] Released dataset for ultra-high-definition image reconstruction evaluation. Our proposed super-resolution image reconstruction UHDBench dataset is released.
[July 2025] Released paper.

🔨TO DO LIST

Training code.
More demos.
Models & Evaluation code.
Huggingface models.
Release zero-shot reconstruction benchmarks.

🙈 Model Zoo

Model	Downsample	Groups	Codebook Size	Training Data	Link
mgvq-f8c32-g4	8	4	32768	imagenet	link
mgvq-f8c32-g8	8	8	16384	imagenet	link
mgvq-f16c32-g4	16	4	32768	imagenet	link
mgvq-f16c32-g8	16	8	16384	imagenet	link
mgvq-f16c32-g4-mix	16	4	32768	mix	link
mgvq-f32c32-g8-mix	32	8	16384	mix	link

🔑 Quick Start

Installation

git clone https://github.com/MKJia/MGVQ.git
cd MGVQ
pip3 install requirements.txt

Download models

Download the pretrained models from our model zoo to your /path/to/your/ckpt.

Data Preparation

Try our UHDBench dataset on huggingface and download to your /path/to/your/dataset.

Evaluation on Reconstruction

Remember to change the paths of ckpt and dataset_root, and make sure you are evaluating the expected model on dataset.

cd evaluation
python3 eval_recon.sh

Generation Demo&Evaluation

You can download the pretrained GPT model for generation on huggingface, and test it with our mgvq-f16c32-g4 tokenizer model for demo image sampling. Remember to change the paths of gpt_ckpt and vq_ckpt.

cd evaluation
python3 demo_gen.sh

We also provide our .npz file on huggingface sampled by sample_c2i_ddp.py for evaluation.

cd evaluation
python3 evaluator.py /path/to/your/VIRTUAL_imagenet256_labeled.npz /path/to/your/GPT_XXL_300ep_topk_12.npz

🗄️Demos

🔥 Qualitative reconstruction images with $16$ x downsampling on $2560$ x $1440$ UHDBench dataset.

🔥 Qualitative class-to-image generation of Imagenet. The classes are dog(Golden Retriever and Husky), cliff, and bald eagle.

🔥 Reconstruction evaluation on 256×256 ImageNet benchmark.

🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 512×512 datasets.

🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 2560×1440 datasets.

🗄️Demos

📌 Citation

If the paper and code from MGVQ help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.

@article{jia2025mgvq,
  title={MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization},
  author={Jia, Mingkai and Yin, Wei and Hu, Xiaotao and Guo, Jiaxin and Guo, Xiaoyang and Zhang, Qian and Long, Xiao-Xiao and Tan, Ping},
  journal={arXiv preprint arXiv:2507.07997},
  year={2025}
}

License

This repository is under the MIT License. For more license questions, please contact Mingkai Jia ([email protected]) and Wei Yin ([email protected]).

Downloads last month: -; Downloads are not tracked for this model. How to track