A newer version of the Gradio SDK is available:
6.1.0
metadata
title: DiariZen Speaker Diarization
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
suggested_hardware: t4-small
pinned: false
license: mit
🎙️ DiariZen Speaker Diarization
High-performance speaker diarization using DiariZen from BUT-FIT.
Features
- 3 Models Available: WavLM Large (recommended), WavLM Base (faster), WavLM Large MLC (multilingual)
- Simple Interface: Upload audio → Select model → Run → Download RTTM
- High Performance: Substantially outperforms Pyannote v3.1
- GPU Accelerated: Uses Hugging Face Spaces GPU
Performance
DiariZen achieves state-of-the-art results:
- AMI-SDM: 13.9% DER (vs 22.4% Pyannote v3.1)
- VoxConverse: 9.1% DER (vs 11.3% Pyannote v3.1)
- AISHELL-4: 10.1% DER (vs 12.2% Pyannote v3.1)
Usage
- Upload audio file or record
- Select diarization model
- Click "Run Diarization"
- View results and download RTTM file
Technical Details
This Space uses a custom Dockerfile to install DiariZen with all its dependencies:
- PyTorch 2.1.1 with CUDA 12.1
- DiariZen toolkit with git submodules
- Bundled pyannote-audio (custom version)
- FFmpeg for audio processing
Citation
@inproceedings{diariZen2024,
title={DiariZen: A toolkit for speaker diarization},
author={Han, Ivo and Landini, Federico and Burget, Lukáš and Černocký, Jan},
booktitle={INTERSPEECH},
year={2024}
}
Source
- DiariZen: https://github.com/BUTSpeechFIT/DiariZen
- License: MIT (Code) | Research/Non-commercial (Models)