Whisper Turbo Fine-Tuned on FLEURS, Common Voice & EdAcc (Indonesian & English)
This model is a fine-tuned version of openai/whisper-large-v3-turbo. It was trained on a combination of Google FLEURS, Common Voice 22.0, and Edinburgh International Accents (EdAcc) datasets.
The training focuses specifically on Indonesian (id_id) and English (en_us). A unique feature of this model is the inclusion of the EdAcc dataset to improve performance on Indonesian-accented English.
- Developed by: Dafis Nadhif Saputra
- Model type: Automatic Speech Recognition (ASR)
- Language(s): Indonesian (id), English (en)
- License: Apache-2.0
- Finetuned from model: openai/whisper-large-v3-turbo
Evaluation Results
The model was evaluated using two different schemes:
1. Internal Training Validation
Measured during the training process on a mixed validation set (all datasets combined).
| Epoch | Validation Loss | WER (%) |
|---|---|---|
| 1 | 0.2717 | 7.42% |
| 2 | 0.2638 | 7.33% |
2. Final Standalone Evaluation
Measured after training on the full concatenated test sets for each language.
| Language | Dataset Source | WER (%) |
|---|---|---|
| English | Fleurs + Common Voice + EdAcc | 9.09% |
| Indonesian | Fleurs + Common Voice | 6.97% |
Training Details
Data Overview
The model was trained on approximately 15,000 samples combining:
- Google FLEURS (Indonesian & English)
- Common Voice 22.0 (Indonesian & English)
- EdAcc (English with Indonesian Accent)
Hyperparameters (Summary)
The model was trained using PEFT (LoRA) to efficiently adapt the weights.
- Learning Rate: 5e-5
- Batch Size: 32 (Effective)
- Epochs: 2
- Precision: FP16
- Optimizer: AdamW
- LoRA Rank: 32
How to Get Started with the Model
You can use the pipeline from the transformers library to easily transcribe audio.
from transformers import pipeline
import torch
# Replace with your model ID
model_id = "Dafisns/whisper-turbo-multilingual-fleurs"
# Initialize the pipeline
pipe = pipeline(
"automatic-speech-recognition",
model=model_id,
device="cuda" if torch.cuda.is_available() else "cpu",
torch_dtype=torch.float16
)
# Transcribe an audio file
# Ensure you specify the language code ('indonesian' or 'english') for better accuracy
# Example for Indonesian audio:
result = pipe("path_to_your_indonesian_audio.mp3", generate_kwargs={"language": "indonesian"})
print(result["text"])
# Example for English audio:
result_en = pipe("path_to_your_english_audio.mp3", generate_kwargs={"language": "english"})
print(result_en["text"])
- Downloads last month
- 105
Model tree for Dafisns/whisper-turbo-multilingual-fleurs
Datasets used to train Dafisns/whisper-turbo-multilingual-fleurs
Evaluation results
- WER (English - Combined) on Combined Test Set (Fleurs + CV + EdAcc)self-reported9.090
- WER (Indonesian - Combined) on Combined Test Set (Fleurs + CV + EdAcc)self-reported6.970