--- title: DiariZen Speaker Diarization emoji: 🎙️ colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.44.1 app_file: app.py suggested_hardware: t4-small pinned: false license: mit --- # 🎙️ DiariZen Speaker Diarization High-performance speaker diarization using DiariZen from BUT-FIT. ## Features - **3 Models Available**: WavLM Large (recommended), WavLM Base (faster), WavLM Large MLC (multilingual) - **Simple Interface**: Upload audio → Select model → Run → Download RTTM - **High Performance**: Substantially outperforms Pyannote v3.1 - **GPU Accelerated**: Uses Hugging Face Spaces GPU ## Performance DiariZen achieves state-of-the-art results: - **AMI-SDM**: 13.9% DER (vs 22.4% Pyannote v3.1) - **VoxConverse**: 9.1% DER (vs 11.3% Pyannote v3.1) - **AISHELL-4**: 10.1% DER (vs 12.2% Pyannote v3.1) ## Usage 1. Upload audio file or record 2. Select diarization model 3. Click "Run Diarization" 4. View results and download RTTM file ## Technical Details This Space uses a custom Dockerfile to install DiariZen with all its dependencies: - PyTorch 2.1.1 with CUDA 12.1 - DiariZen toolkit with git submodules - Bundled pyannote-audio (custom version) - FFmpeg for audio processing ## Citation ```bibtex @inproceedings{diariZen2024, title={DiariZen: A toolkit for speaker diarization}, author={Han, Ivo and Landini, Federico and Burget, Lukáš and Černocký, Jan}, booktitle={INTERSPEECH}, year={2024} } ``` ## Source - **DiariZen**: https://github.com/BUTSpeechFIT/DiariZen - **License**: MIT (Code) | Research/Non-commercial (Models)