Multimodal Instruction-based Editing and Generation
Scaling Omni LLMs to Personalized Long-Horizon Speech