Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24, 2024 • 63
RAE Collection Collection for Diffusion Transformers with Representation Autoencoders • 1 item • Updated Oct 14 • 10
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions Paper • 2503.20290 • Published Mar 26 • 1
ConECT Dataset: Overcoming Data Scarcity in Context-Aware E-Commerce MT Paper • 2506.04929 • Published Jun 5 • 2
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published Mar 6 • 72
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement Paper • 2406.11546 • Published Jun 17, 2024 • 1
MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder Paper • 2409.14074 • Published Sep 21, 2024 • 3
jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval Paper • 2506.18902 • Published Jun 23 • 12
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning Paper • 2506.00338 • Published May 31 • 10