whisperd-nl / README.md
pevers's picture
Update README.md
424e9bb verified
metadata
language:
  - nl
tags:
  - whisper
  - speech-recognition
  - dutch
  - automatic-speech-recognition
license: mit
base_model: openai/whisper-large-v3
pipeline_tag: automatic-speech-recognition

WhisperD-NL: Fine-tuned Whisper for Dutch Speech Recognition

WhisperD-NL is a fine-tuned Whisper model trained on the Corpus Gesproken Nederlands (CGN) specifically to detect disfluencies, speakers and non-speech events.

Model Details

  • Base Model: openai/whisper-large-v3
  • Language: Dutch (nl)
  • Task: Automatic Speech Recognition
  • Fine-tuning: Corpus Gesproken Nederlands (CGN)
  • Speaker Identification: Speaker identification is implemented up to four different speakers via a tag ([S1], [S2], [S3] and [S4])
  • WER: 16.42 for disfluencies, speaker identification and non-speech events based on whisper-large-v3

Usage

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torch
import soundfile as sf

# Load model and processor
processor = AutoProcessor.from_pretrained("pevers/whisperd-nl")
model = AutoModelForSpeechSeq2Seq.from_pretrained("pevers/whisperd-nl")

# Load and preprocess audio
audio, sr = sf.read("path_to_dutch_audio.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")

# Generate transcription
with torch.no_grad():
    predicted_ids = model.generate(inputs.input_features)
    
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Limitations

  • Optimized specifically for Dutch language with disfluencies and non-speech events
  • Inherits limitations from the base Whisper model