marianbasti
/

distil-whisper-large-v3-es

Automatic Speech Recognition

Model card Files Files and versions

marianbasti commited on Feb 3, 2024

Commit

c81bab3

·

verified ·

1 Parent(s): f4d8cf2

Update README.md

Files changed (1) hide show

README.md +7 -6

README.md CHANGED Viewed

@@ -8,10 +8,11 @@ library_name: transformers
 pipeline_tag: automatic-speech-recognition
 tags:
 - spanish
 - speech
 - recognition
 - whisper
-- distl-whisper
 ---
 # distil-whisper-large-v3-es
@@ -155,7 +156,7 @@ print(result["text"])
 ```
 ## Training
-The model was trained for 40,000 optimisation steps (or 0.98 epochs), on a single RTX3090 for ~30 hours, using the following training parameters:
 ```
 --teacher_model_name_or_path "openai/whisper-large-v3"
 --train_dataset_name "mozilla-foundation/common_voice_16_1"
@@ -166,14 +167,14 @@ The model was trained for 40,000 optimisation steps (or 0.98 epochs), on a singl
 --eval_dataset_config_name "es"
 --eval_split_name "validation"
 --eval_text_column_name "sentence"
---eval_steps 5000
---save_steps 5000
 --warmup_steps 500
 --learning_rate 1e-4
 --lr_scheduler_type "linear"
 --logging_steps 25
 --save_total_limit 1
---max_steps 40000
 --wer_threshold 10
 --per_device_train_batch_size 8
 --per_device_eval_batch_size 8
@@ -192,7 +193,7 @@ The model was trained for 40,000 optimisation steps (or 0.98 epochs), on a singl
 ## Results
-The distilled model performs with a 5.874% normalized WER. Further training would yield better results
 ## License

 pipeline_tag: automatic-speech-recognition
 tags:
 - spanish
+- español
 - speech
 - recognition
 - whisper
+- distil-whisper
 ---
 # distil-whisper-large-v3-es
 ```
 ## Training
+The model was trained for 60,000 optimisation steps (or around 1.47 epochs), on a single RTX3090 for ~60 hours, using the following training parameters:
 ```
 --teacher_model_name_or_path "openai/whisper-large-v3"
 --train_dataset_name "mozilla-foundation/common_voice_16_1"
 --eval_dataset_config_name "es"
 --eval_split_name "validation"
 --eval_text_column_name "sentence"
+--eval_steps 10000
+--save_steps 10000
 --warmup_steps 500
 --learning_rate 1e-4
 --lr_scheduler_type "linear"
 --logging_steps 25
 --save_total_limit 1
+--max_steps 60000
 --wer_threshold 10
 --per_device_train_batch_size 8
 --per_device_eval_batch_size 8
 ## Results
+The distilled model performs with a 5.11% WER (10.15% orthogonal WER).
 ## License