gryannote / README.md
ahmad walidurosyad
Upgrade Gradio to 4.44.1 to fix API schema generation error
a334e75

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: DiariZen Speaker Diarization
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
suggested_hardware: t4-small
pinned: false
license: mit

🎙️ DiariZen Speaker Diarization

High-performance speaker diarization using DiariZen from BUT-FIT.

Features

  • 3 Models Available: WavLM Large (recommended), WavLM Base (faster), WavLM Large MLC (multilingual)
  • Simple Interface: Upload audio → Select model → Run → Download RTTM
  • High Performance: Substantially outperforms Pyannote v3.1
  • GPU Accelerated: Uses Hugging Face Spaces GPU

Performance

DiariZen achieves state-of-the-art results:

  • AMI-SDM: 13.9% DER (vs 22.4% Pyannote v3.1)
  • VoxConverse: 9.1% DER (vs 11.3% Pyannote v3.1)
  • AISHELL-4: 10.1% DER (vs 12.2% Pyannote v3.1)

Usage

  1. Upload audio file or record
  2. Select diarization model
  3. Click "Run Diarization"
  4. View results and download RTTM file

Technical Details

This Space uses a custom Dockerfile to install DiariZen with all its dependencies:

  • PyTorch 2.1.1 with CUDA 12.1
  • DiariZen toolkit with git submodules
  • Bundled pyannote-audio (custom version)
  • FFmpeg for audio processing

Citation

@inproceedings{diariZen2024,
  title={DiariZen: A toolkit for speaker diarization},
  author={Han, Ivo and Landini, Federico and Burget, Lukáš and Černocký, Jan},
  booktitle={INTERSPEECH},
  year={2024}
}

Source