BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published Aug 14 • 59
Unified Speech-Text Pretraining for Spoken Dialog Modeling Paper • 2402.05706 • Published Feb 8, 2024 • 7