Speech and Audio - a Ahalder Collection

Ahalder 's Collections

Agent

College Project

SLM

Image Processing

Image generation

NLP LLM

Speech and Audio

Games

Video generattion

papers

Speech and Audio

updated Sep 24

facebook/wav2vec2-base-960h

Automatic Speech Recognition • 94.4M • Updated Nov 14, 2022 • 5.85M • 381
ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25, 2024 • 60
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Paper • 2409.10819 • Published Sep 17, 2024 • 19
jadechoghari/openmusic

Text-to-Audio • Updated Oct 10, 2024 • 53 • 69
Runtime error

8

8

SEE-2-SOUND

👀

Generate spatial audio from images (and optionally text)
SWivid/F5-TTS

Text-to-Speech • Updated Mar 21 • 605k • 1.12k
Runtime error

8

8

Paper Whisperer

📈

Paper Whisperer
aiola/whisper-ner-v1

Automatic Speech Recognition • 2B • Updated Nov 21, 2024 • 13 • 23
Zyphra/Zonos-v0.1-transformer

Text-to-Speech • Updated Jun 3 • 30.4k • 417
Zyphra/Zonos-v0.1-hybrid

Text-to-Speech • Updated Jun 3 • 22.6k • 1.1k
innova-ai/AEROMamba

Updated Feb 2 • 9
herimor/voxtream

Text-to-Speech • Updated Sep 27 • 1.28k • 20