Wang Chengyao PRO

wcy1122

·

https://wcy1122.github.io/

AI & ML interests

Multimodal Intelligence

Recent Activity

updated a Space about 1 month ago

wcy1122/MGM-Omni

upvoted a paper 2 months ago

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

upvoted a paper 4 months ago

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models

View all activity

Organizations

Posts 1

Post

4915

🚀 Update: We release the technical report of MGM-Omni. Moreover, we introduce Long-TTS-Eval, a benchmark for long-form and complex case TTS evaluation.
📝 Arxiv: https://arxiv.org/abs/2509.25131
📊 benchmark: wcy1122/Long-TTS-Eval
-------------------------
🚀 Introducing MGM-Omni, an omni-chatbot capable of processing text, image, video, and speech inputs, and can generate both text and speech responses.
👂 MGM-Omni support hour-level audio understanding.
🗣️ MGM-Omni support 10-minute speech generation and voice cloning.
For more details, please check:
📝 Blog: https://mgm-omni.notion.site/MGM-Omni-An-Open-source-Omni-Chatbot-2395728e0b0180149ac9f24683fc9907
🌟 Code: https://github.com/dvlab-research/MGM-Omni
🤖 Model: wcy1122/mgm-omni-6896075e97317a88825032e1
🎮 Demo: wcy1122/MGM-Omni

Collections 1

Papers 7

arxiv:2510.23607

arxiv:2510.06679

arxiv:2509.25131

arxiv:2412.17098

spaces 4

MGM Omni

Scaling Omni LLMs to Personalized Long-Horizon Speech

DreamOmni2 Edit

Multimodal Instruction-based Editing and Generation

DreamOmni2 Gen

Multimodal Instruction-based Editing and Generation

Mini Gemini

models 10

wcy1122/Qwen2.5-VL-3B-ViT

0.7B • Updated Oct 11, 2025 • 1

wcy1122/MGM-Omni-TTS-2B-0927

Any-to-Any • 2B • Updated Oct 7, 2025 • 29 • 11

wcy1122/MGM-Omni-7B

Text Generation • 8B • Updated Oct 7, 2025 • 26 • 5

wcy1122/MGM-Omni-32B

Text Generation • 33B • Updated Aug 17, 2025 • 31 • 5

wcy1122/MGM-Omni-TTS-2B

Text-to-Speech • 2B • Updated Aug 17, 2025 • 3 • 3

wcy1122/MGM-Omni-TTS-4B

Text-to-Speech • 5B • Updated Aug 17, 2025 • 9 • 6

wcy1122/MGM-Omni-TTS-0.6B

Text-to-Speech • 0.7B • Updated Aug 17, 2025 • 2 • 4

wcy1122/Qwen2.5-VL-32B-ViT

0.7B • Updated Aug 15, 2025 • 1 • 1

wcy1122/Qwen2A-7B-Encoder

0.6B • Updated Aug 8, 2025 • 17 • 1

wcy1122/Qwen2.5-VL-7B-ViT

0.7B • Updated Aug 8, 2025 • 15 • 1

datasets 1

wcy1122/Long-TTS-Eval

Viewer • Updated Oct 6, 2025 • 1.24k • 219 • 11