RecA
Collection
Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning!
β’
8 items
β’
Updated
β’
12
A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.
This repository hosts the model weights for Harmon-1.5B-RecA, a model from the paper Reconstruction Alignment Improves Unified Multimodal Models. For installation, usage instructions, and further documentation, please visit Harmon's original GitHub repository.
| Model | GenEval β | DPGBench β | WISE β |
|---|---|---|---|
| Harmon-1.5B | 0.73 | 80.93 | 0.41 |
| Harmon-1.5B-RecA | 0.86 | 87.21 | 0.50 |
If you find our work inspiring or use our codebase in your research, please consider giving a star β and a citation~
@article{xie2025reconstruction,
title={Reconstruction Alignment Improves Unified Multimodal Models},
author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
journal={arXiv preprint arXiv:2509.07295},
year={2025}
}
Base model
wusize/Harmon-1_5B