SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models Paper • 2509.15661 • Published Sep 19 • 2 • 1