gryannote

Runtime error

App Files Files Community

gryannote / README.md

ahmad walidurosyad

Upgrade Gradio to 4.44.1 to fix API schema generation error

a334e75 19 days ago

preview code

raw

history blame contribute delete

1.59 kB

	---
	title: DiariZen Speaker Diarization
	emoji: 🎙️
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.1
	app_file: app.py
	suggested_hardware: t4-small
	pinned: false
	license: mit
	---

	# 🎙️ DiariZen Speaker Diarization

	High-performance speaker diarization using DiariZen from BUT-FIT.

	## Features

	- 3 Models Available: WavLM Large (recommended), WavLM Base (faster), WavLM Large MLC (multilingual)
	- Simple Interface: Upload audio → Select model → Run → Download RTTM
	- High Performance: Substantially outperforms Pyannote v3.1
	- GPU Accelerated: Uses Hugging Face Spaces GPU

	## Performance

	DiariZen achieves state-of-the-art results:
	- AMI-SDM: 13.9% DER (vs 22.4% Pyannote v3.1)
	- VoxConverse: 9.1% DER (vs 11.3% Pyannote v3.1)
	- AISHELL-4: 10.1% DER (vs 12.2% Pyannote v3.1)

	## Usage

	1. Upload audio file or record
	2. Select diarization model
	3. Click "Run Diarization"
	4. View results and download RTTM file

	## Technical Details

	This Space uses a custom Dockerfile to install DiariZen with all its dependencies:
	- PyTorch 2.1.1 with CUDA 12.1
	- DiariZen toolkit with git submodules
	- Bundled pyannote-audio (custom version)
	- FFmpeg for audio processing

	## Citation

	```bibtex
	@inproceedings{diariZen2024,
	title={DiariZen: A toolkit for speaker diarization},
	author={Han, Ivo and Landini, Federico and Burget, Lukáš and Černocký, Jan},
	booktitle={INTERSPEECH},
	year={2024}
	}
	```

	## Source

	- DiariZen: https://github.com/BUTSpeechFIT/DiariZen
	- License: MIT (Code) \| Research/Non-commercial (Models)