|
|
--- |
|
|
title: DiariZen Speaker Diarization |
|
|
emoji: 🎙️ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 4.44.1 |
|
|
app_file: app.py |
|
|
suggested_hardware: t4-small |
|
|
pinned: false |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# 🎙️ DiariZen Speaker Diarization |
|
|
|
|
|
High-performance speaker diarization using DiariZen from BUT-FIT. |
|
|
|
|
|
## Features |
|
|
|
|
|
- **3 Models Available**: WavLM Large (recommended), WavLM Base (faster), WavLM Large MLC (multilingual) |
|
|
- **Simple Interface**: Upload audio → Select model → Run → Download RTTM |
|
|
- **High Performance**: Substantially outperforms Pyannote v3.1 |
|
|
- **GPU Accelerated**: Uses Hugging Face Spaces GPU |
|
|
|
|
|
## Performance |
|
|
|
|
|
DiariZen achieves state-of-the-art results: |
|
|
- **AMI-SDM**: 13.9% DER (vs 22.4% Pyannote v3.1) |
|
|
- **VoxConverse**: 9.1% DER (vs 11.3% Pyannote v3.1) |
|
|
- **AISHELL-4**: 10.1% DER (vs 12.2% Pyannote v3.1) |
|
|
|
|
|
## Usage |
|
|
|
|
|
1. Upload audio file or record |
|
|
2. Select diarization model |
|
|
3. Click "Run Diarization" |
|
|
4. View results and download RTTM file |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
This Space uses a custom Dockerfile to install DiariZen with all its dependencies: |
|
|
- PyTorch 2.1.1 with CUDA 12.1 |
|
|
- DiariZen toolkit with git submodules |
|
|
- Bundled pyannote-audio (custom version) |
|
|
- FFmpeg for audio processing |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{diariZen2024, |
|
|
title={DiariZen: A toolkit for speaker diarization}, |
|
|
author={Han, Ivo and Landini, Federico and Burget, Lukáš and Černocký, Jan}, |
|
|
booktitle={INTERSPEECH}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Source |
|
|
|
|
|
- **DiariZen**: https://github.com/BUTSpeechFIT/DiariZen |
|
|
- **License**: MIT (Code) | Research/Non-commercial (Models) |
|
|
|