W2V-BERT 2.0 Hybrid V3 Kikuyu ASR
This model is a fine-tuned version of facebook/w2v-bert-2.0 for Kikuyu (G末k农y农) automatic speech recognition.
Model Description
This model uses a Hybrid V3 architecture that combines:
- 24 MMS-style bottleneck adapters (64-dim) in each transformer layer
- Single-layer transformer decoder with pre+post normalization
- Gated residual connections for stable training
Architecture Details
- Base Model: facebook/w2v-bert-2.0 (580M parameters)
- Trainable Parameters: 11,660,835 (1.97% of total)
- Adapter Dimension: 64
- Decoder Hidden Size: 1024 (matches W2V-BERT)
- Decoder FFN Size: 2048
Training Details
- Training Samples: 5,000
- Epochs: 20
- Learning Rate: 0.0003
- Batch Size: 4 (effective: 16 with gradient accumulation)
- Warmup Steps: 500
- Optimizer: AdamW with cosine LR schedule
Performance
| Metric | Value |
|---|---|
| Word Error Rate (WER) | 20.30% |
| Eval Loss | 0.2371 |
| Train Loss | 0.3413 |
Usage
Limitations
- Trained specifically for Kikuyu language
- Best performance on clean, clear audio
- May struggle with heavy background noise or very fast speech
Citation
If you use this model, please cite:
License
Apache 2.0
- Downloads last month
- -
Model tree for mutisya/w2v-bert-hybrid-v3-kikuyu-asr
Base model
facebook/w2v-bert-2.0