Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Ahalder 's Collections
Agent
Time series
Embedding
College Project
SLM
Multimodal
Image Processing
Image generation
Dataset
NLP LLM
Speech and Audio
Games
Segmentation
Video generattion
RAG & Quering
Recognition
papers

Speech and Audio

updated Sep 24
Upvote
-

  • facebook/wav2vec2-base-960h

    Automatic Speech Recognition • 94.4M • Updated Nov 14, 2022 • 5.85M • 381

  • ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Paper • 2402.16153 • Published Feb 25, 2024 • 60

  • EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

    Paper • 2409.10819 • Published Sep 17, 2024 • 19

  • jadechoghari/openmusic

    Text-to-Audio • Updated Oct 10, 2024 • 53 • 69

  • Runtime error
    8
    8

    SEE-2-SOUND

    👀

    Generate spatial audio from images (and optionally text)


  • SWivid/F5-TTS

    Text-to-Speech • Updated Mar 21 • 605k • 1.12k

  • Runtime error
    8
    8

    Paper Whisperer

    📈

    Paper Whisperer


  • aiola/whisper-ner-v1

    Automatic Speech Recognition • 2B • Updated Nov 21, 2024 • 13 • 23

  • Zyphra/Zonos-v0.1-transformer

    Text-to-Speech • Updated Jun 3 • 30.4k • 417

  • Zyphra/Zonos-v0.1-hybrid

    Text-to-Speech • Updated Jun 3 • 22.6k • 1.1k

  • innova-ai/AEROMamba

    Updated Feb 2 • 9

  • herimor/voxtream

    Text-to-Speech • Updated Sep 27 • 1.28k • 20
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs