gryannote / README.md
ahmad walidurosyad
Upgrade Gradio to 4.44.1 to fix API schema generation error
a334e75
---
title: DiariZen Speaker Diarization
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
suggested_hardware: t4-small
pinned: false
license: mit
---
# 🎙️ DiariZen Speaker Diarization
High-performance speaker diarization using DiariZen from BUT-FIT.
## Features
- **3 Models Available**: WavLM Large (recommended), WavLM Base (faster), WavLM Large MLC (multilingual)
- **Simple Interface**: Upload audio → Select model → Run → Download RTTM
- **High Performance**: Substantially outperforms Pyannote v3.1
- **GPU Accelerated**: Uses Hugging Face Spaces GPU
## Performance
DiariZen achieves state-of-the-art results:
- **AMI-SDM**: 13.9% DER (vs 22.4% Pyannote v3.1)
- **VoxConverse**: 9.1% DER (vs 11.3% Pyannote v3.1)
- **AISHELL-4**: 10.1% DER (vs 12.2% Pyannote v3.1)
## Usage
1. Upload audio file or record
2. Select diarization model
3. Click "Run Diarization"
4. View results and download RTTM file
## Technical Details
This Space uses a custom Dockerfile to install DiariZen with all its dependencies:
- PyTorch 2.1.1 with CUDA 12.1
- DiariZen toolkit with git submodules
- Bundled pyannote-audio (custom version)
- FFmpeg for audio processing
## Citation
```bibtex
@inproceedings{diariZen2024,
title={DiariZen: A toolkit for speaker diarization},
author={Han, Ivo and Landini, Federico and Burget, Lukáš and Černocký, Jan},
booktitle={INTERSPEECH},
year={2024}
}
```
## Source
- **DiariZen**: https://github.com/BUTSpeechFIT/DiariZen
- **License**: MIT (Code) | Research/Non-commercial (Models)