Whisper Small French - Fine-tuned on Common Voice

This model is a fine-tuned version of openai/whisper-small on French speech data from Common Voice.

Model Description

  • Base Model: OpenAI Whisper Small (244M parameters)
  • Language: French
  • Task: Automatic Speech Recognition (Transcription)
  • Training Data: Common Voice 13.0 French dataset
  • Training Samples: 100,000 samples
  • Training Duration: 3 epochs

Training Details

Training Hyperparameters

  • Optimizer: AdamW
  • Learning Rate: 1e-5
  • Batch Size: 8
  • Gradient Accumulation: 2 steps
  • Mixed Precision: FP16
  • Scheduler: Cosine Annealing
  • Max Epochs: 3
  • Early Stopping: Patience of 2 epochs on validation WER

Hardware

  • GPU: NVIDIA T4/L4
  • Training Time: ~4 hours

Usage

from transformers import pipeline

# Load the model
pipe = pipeline(
    "automatic-speech-recognition",
    model="keypa/whisper-small-fr-cv-100k",
    device=0  # Use GPU (or -1 for CPU)
)

# Transcribe audio
result = pipe("path/to/your/french/audio.wav")
print(result["text"])

Or with the model and processor directly:

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

# Load model and processor
processor = WhisperProcessor.from_pretrained("keypa/whisper-small-fr-cv-100k")
model = WhisperForConditionalGeneration.from_pretrained("keypa/whisper-small-fr-cv-100k")
model.to("cuda")  # or "cpu"

# Load audio
import librosa
audio, sr = librosa.load("audio.wav", sr=16000)

# Process
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
input_features = input_features.to("cuda")

# Generate
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

Performance

This model was trained on 100k samples and achieves good performance on French speech recognition tasks.

Limitations

  • Optimized for French language only
  • Performance may vary on accents not well-represented in Common Voice
  • Best suited for clear audio recordings

Citation

If you use this model, please cite:

@misc{whisper-small-fr-100k,
  author = {keypa},
  title = {Whisper Small French Fine-tuned},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/keypa/whisper-small-fr-cv-100k}}
}

Acknowledgements

Downloads last month
20
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for keypa/whisper-small-fr-cv-100k

Finetuned
(3057)
this model

Dataset used to train keypa/whisper-small-fr-cv-100k