MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

arXiv

Mingkai Jia1,2, Wei Yin2*ยง, Xiaotao Hu1,2, Jiaxin Guo3, Xiaoyang Guo2
Qian Zhang2, Xiao-Xiao Long4, Ping Tan1

HKUST1, Horizon Robotics2, CUHK3, NJU4
* Corresponding Author, ยง Project Leader

๐Ÿš€News

  • [August 2025] Achieve SOTA at paperwithcode leaderboards: Image Reconstruction on ImageNet and UHDBench.
  • [August 2025] Released Inference Code
  • [August 2025] Released model zoo.
  • [August 2025] Released dataset for ultra-high-definition image reconstruction evaluation. Our proposed super-resolution image reconstruction UHDBench dataset is released.
  • [July 2025] Released paper.

๐Ÿ”จTO DO LIST

  • Training code.
  • More demos.
  • Models & Evaluation code.
  • Huggingface models.
  • Release zero-shot reconstruction benchmarks.

๐Ÿ™ˆ Model Zoo

Model Downsample Groups Codebook Size Training Data Link
mgvq-f8c32-g4 8 4 32768 imagenet link
mgvq-f8c32-g8 8 8 16384 imagenet link
mgvq-f16c32-g4 16 4 32768 imagenet link
mgvq-f16c32-g8 16 8 16384 imagenet link
mgvq-f16c32-g4-mix 16 4 32768 mix link
mgvq-f32c32-g8-mix 32 8 16384 mix link

๐Ÿ”‘ Quick Start

Installation

git clone https://github.com/MKJia/MGVQ.git
cd MGVQ
pip3 install requirements.txt

Download models

Download the pretrained models from our model zoo to your /path/to/your/ckpt.

Data Preparation

Try our UHDBench dataset on huggingface and download to your /path/to/your/dataset.

Evaluation on Reconstruction

Remember to change the paths of ckpt and dataset_root, and make sure you are evaluating the expected model on dataset.

cd evaluation
python3 eval_recon.sh

Generation Demo&Evaluation

You can download the pretrained GPT model for generation on huggingface, and test it with our mgvq-f16c32-g4 tokenizer model for demo image sampling. Remember to change the paths of gpt_ckpt and vq_ckpt.

cd evaluation
python3 demo_gen.sh

We also provide our .npz file on huggingface sampled by sample_c2i_ddp.py for evaluation.

cd evaluation
python3 evaluator.py /path/to/your/VIRTUAL_imagenet256_labeled.npz /path/to/your/GPT_XXL_300ep_topk_12.npz

๐Ÿ—„๏ธDemos

  • ๐Ÿ”ฅ Qualitative reconstruction images with $16$ x downsampling on $2560$ x $1440$ UHDBench dataset.
  • ๐Ÿ”ฅ Qualitative class-to-image generation of Imagenet. The classes are dog(Golden Retriever and Husky), cliff, and bald eagle.
  • ๐Ÿ”ฅ Reconstruction evaluation on 256ร—256 ImageNet benchmark.
  • ๐Ÿ”ฅ Zero-shot reconstruction evaluation with a downsample ratio of 16 on 512ร—512 datasets.
  • ๐Ÿ”ฅ Zero-shot reconstruction evaluation with a downsample ratio of 16 on 2560ร—1440 datasets.

๐Ÿ—„๏ธDemos

๐Ÿ“Œ Citation

If the paper and code from MGVQ help your research, we kindly ask you to give a citation to our paper โค๏ธ. Additionally, if you appreciate our work and find this repository useful, giving it a star โญ๏ธ would be a wonderful way to support our work. Thank you very much.

@article{jia2025mgvq,
  title={MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization},
  author={Jia, Mingkai and Yin, Wei and Hu, Xiaotao and Guo, Jiaxin and Guo, Xiaoyang and Zhang, Qian and Long, Xiao-Xiao and Tan, Ping},
  journal={arXiv preprint arXiv:2507.07997},
  year={2025}
}

License

This repository is under the MIT License. For more license questions, please contact Mingkai Jia ([email protected]) and Wei Yin ([email protected]).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support