|
|
--- |
|
|
library_name: transformers |
|
|
license: cc-by-4.0 |
|
|
datasets: |
|
|
- badrex/ethiopian-speech-flat |
|
|
--- |
|
|
|
|
|
<div align="center" style="line-height: 1;"> |
|
|
<h1>Automatic Speech Recognition for Wolaytta 🇪🇹</h1> |
|
|
<a href="https://huggingface.co/datasets/badrex/ethiopian-speech-flat" target="_blank" style="margin: 2px;"> |
|
|
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-ffc107?color=ffca28&logoColor=white" style="display: inline-block; vertical-align: middle;"/> |
|
|
<a href="https://huggingface.co/spaces/badrex/Ethiopia-ASR" target="_blank" style="margin: 2px;"> |
|
|
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Space-ffc107?color=c62828&logoColor=white" style="display: inline-block; vertical-align: middle;"/> |
|
|
<a href="https://creativecommons.org/licenses/by/4.0/deed.en" style="margin: 2px;"> |
|
|
<img alt="License" src="https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg" style="display: inline-block; vertical-align: middle;"/> |
|
|
</a> |
|
|
</div> |
|
|
|
|
|
|
|
|
|
|
|
## 🍇 Model Description |
|
|
|
|
|
This is a Automatic Speech Recognition (ASR) model for Wolaytta, one of the official languages of Ethiopia. |
|
|
It is fine‑tuned from Wav2Vec2‑BERT 2.0 using the [Ethio speech corpus](https://huggingface.co/datasets/badrex/ethiopian-speech-flat). |
|
|
|
|
|
- **Developed by:** Badr al-Absi |
|
|
- **Model type:** Speech Recognition (ASR) |
|
|
- **Languages:** Wolaytta |
|
|
- **License:** CC-BY-4.0 |
|
|
- **Finetuned from:** facebook/w2v-bert-2.0 |
|
|
|
|
|
## 🎧 Direct Use |
|
|
|
|
|
``` python |
|
|
from transformers import Wav2Vec2BertProcessor, Wav2Vec2BertForCTC |
|
|
import torchaudio, torch |
|
|
|
|
|
processor = Wav2Vec2BertProcessor.from_pretrained("badrex/w2v-bert-2.0-wolaytta-asr") |
|
|
model = Wav2Vec2BertForCTC.from_pretrained("badrex/w2v-bert-2.0-wolaytta-asr") |
|
|
|
|
|
audio, sr = torchaudio.load("audio.wav") |
|
|
inputs = processor(audio.squeeze(), sampling_rate=sr, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
|
|
|
pred_ids = torch.argmax(logits, dim=-1) |
|
|
transcription = processor.batch_decode(pred_ids)[0] |
|
|
|
|
|
print(transcription) |
|
|
``` |
|
|
|
|
|
## 🔧 Downstream Use |
|
|
|
|
|
- Voice assistants |
|
|
- Accessibility tools |
|
|
- Research baselines |
|
|
|
|
|
## 🚫 Out‑of‑Scope Use |
|
|
|
|
|
- Other languages besides Wolaytta |
|
|
- High‑stakes deployments without human review |
|
|
- Noisy audio without further tuning |
|
|
|
|
|
## ⚠️ Risks & Limitations |
|
|
|
|
|
Performance varies with accents, dialects, and recording quality. |
|
|
|
|
|
|
|
|
|
|
|
## 📌 Citation |
|
|
|
|
|
``` bibtex |
|
|
@misc{w2v_bert_ethiopian_asr, |
|
|
author = {Badr M. Abdullah}, |
|
|
title = {Fine-tuning Wav2Vec2-BERT 2.0 for Ethiopian ASR}, |
|
|
year = {2025}, |
|
|
url = {https://huggingface.co/badrex/w2v-bert-2.0-wolaytta-asr} |
|
|
} |
|
|
``` |