Whisper Turbo Fine-Tuned on FLEURS, Common Voice & EdAcc (Indonesian & English)

This model is a fine-tuned version of openai/whisper-large-v3-turbo. It was trained on a combination of Google FLEURS, Common Voice 22.0, and Edinburgh International Accents (EdAcc) datasets.

The training focuses specifically on Indonesian (id_id) and English (en_us). A unique feature of this model is the inclusion of the EdAcc dataset to improve performance on Indonesian-accented English.

  • Developed by: Dafis Nadhif Saputra
  • Model type: Automatic Speech Recognition (ASR)
  • Language(s): Indonesian (id), English (en)
  • License: Apache-2.0
  • Finetuned from model: openai/whisper-large-v3-turbo

LinkedIn Gmail

Evaluation Results

The model was evaluated using two different schemes:

1. Internal Training Validation

Measured during the training process on a mixed validation set (all datasets combined).

Epoch Validation Loss WER (%)
1 0.2717 7.42%
2 0.2638 7.33%

2. Final Standalone Evaluation

Measured after training on the full concatenated test sets for each language.

Language Dataset Source WER (%)
English Fleurs + Common Voice + EdAcc 9.09%
Indonesian Fleurs + Common Voice 6.97%

Training Details

Data Overview

The model was trained on approximately 15,000 samples combining:

  • Google FLEURS (Indonesian & English)
  • Common Voice 22.0 (Indonesian & English)
  • EdAcc (English with Indonesian Accent)

Hyperparameters (Summary)

The model was trained using PEFT (LoRA) to efficiently adapt the weights.

  • Learning Rate: 5e-5
  • Batch Size: 32 (Effective)
  • Epochs: 2
  • Precision: FP16
  • Optimizer: AdamW
  • LoRA Rank: 32

How to Get Started with the Model

You can use the pipeline from the transformers library to easily transcribe audio.

from transformers import pipeline
import torch

# Replace with your model ID
model_id = "Dafisns/whisper-turbo-multilingual-fleurs"

# Initialize the pipeline
pipe = pipeline(
    "automatic-speech-recognition", 
    model=model_id, 
    device="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.float16
)

# Transcribe an audio file
# Ensure you specify the language code ('indonesian' or 'english') for better accuracy

# Example for Indonesian audio:
result = pipe("path_to_your_indonesian_audio.mp3", generate_kwargs={"language": "indonesian"})
print(result["text"])

# Example for English audio:
result_en = pipe("path_to_your_english_audio.mp3", generate_kwargs={"language": "english"})
print(result_en["text"])
Downloads last month
105
Safetensors
Model size
0.8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Dafisns/whisper-turbo-multilingual-fleurs

Adapter
(70)
this model
Finetunes
1 model

Datasets used to train Dafisns/whisper-turbo-multilingual-fleurs

Evaluation results

  • WER (English - Combined) on Combined Test Set (Fleurs + CV + EdAcc)
    self-reported
    9.090
  • WER (Indonesian - Combined) on Combined Test Set (Fleurs + CV + EdAcc)
    self-reported
    6.970