Whisper Turbo Fine-Tuned on FLEURS, Common Voice & EdAcc (Indonesian & English)

This model is a fine-tuned version of openai/whisper-large-v3-turbo. It was trained on a combination of Google FLEURS, Common Voice 22.0, and Edinburgh International Accents (EdAcc) datasets.

The training focuses specifically on Indonesian (id_id) and English (en_us). A unique feature of this model is the inclusion of the EdAcc dataset to improve performance on Indonesian-accented English.

Developed by: Dafis Nadhif Saputra
Model type: Automatic Speech Recognition (ASR)
Language(s): Indonesian (id), English (en)
License: Apache-2.0
Finetuned from model: openai/whisper-large-v3-turbo

Evaluation Results

The model was evaluated using two different schemes:

1. Internal Training Validation

Measured during the training process on a mixed validation set (all datasets combined).

Epoch	Validation Loss	WER (%)
1	0.2717	7.42%
2	0.2638	7.33%

2. Final Standalone Evaluation

Measured after training on the full concatenated test sets for each language.

Language	Dataset Source	WER (%)
English	Fleurs + Common Voice + EdAcc	9.09%
Indonesian	Fleurs + Common Voice	6.97%

Training Details

Data Overview

The model was trained on approximately 15,000 samples combining:

Google FLEURS (Indonesian & English)
Common Voice 22.0 (Indonesian & English)
EdAcc (English with Indonesian Accent)

Hyperparameters (Summary)

The model was trained using PEFT (LoRA) to efficiently adapt the weights.

Learning Rate: 5e-5
Batch Size: 32 (Effective)
Epochs: 2
Precision: FP16
Optimizer: AdamW
LoRA Rank: 32

How to Get Started with the Model

You can use the pipeline from the transformers library to easily transcribe audio.

from transformers import pipeline
import torch

# Replace with your model ID
model_id = "Dafisns/whisper-turbo-multilingual-fleurs"

# Initialize the pipeline
pipe = pipeline(
    "automatic-speech-recognition", 
    model=model_id, 
    device="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.float16
)

# Transcribe an audio file
# Ensure you specify the language code ('indonesian' or 'english') for better accuracy

# Example for Indonesian audio:
result = pipe("path_to_your_indonesian_audio.mp3", generate_kwargs={"language": "indonesian"})
print(result["text"])

# Example for English audio:
result_en = pipe("path_to_your_english_audio.mp3", generate_kwargs={"language": "english"})
print(result_en["text"])

Downloads last month: 105

Safetensors

Model size

0.8B params

Tensor type

F16

Model tree for Dafisns/whisper-turbo-multilingual-fleurs

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Adapter

(70)

this model

Finetunes

1 model

Datasets used to train Dafisns/whisper-turbo-multilingual-fleurs

Evaluation results

WER (English - Combined) on Combined Test Set (Fleurs + CV + EdAcc)
self-reported

9.090
WER (Indonesian - Combined) on Combined Test Set (Fleurs + CV + EdAcc)
self-reported

6.970

View on Papers With Code