W2V-BERT 2.0 Hybrid V3 Kikuyu ASR

This model is a fine-tuned version of facebook/w2v-bert-2.0 for Kikuyu (G末k农y农) automatic speech recognition.

Model Description

This model uses a Hybrid V3 architecture that combines:

  • 24 MMS-style bottleneck adapters (64-dim) in each transformer layer
  • Single-layer transformer decoder with pre+post normalization
  • Gated residual connections for stable training

Architecture Details

  • Base Model: facebook/w2v-bert-2.0 (580M parameters)
  • Trainable Parameters: 11,660,835 (1.97% of total)
  • Adapter Dimension: 64
  • Decoder Hidden Size: 1024 (matches W2V-BERT)
  • Decoder FFN Size: 2048

Training Details

  • Training Samples: 5,000
  • Epochs: 20
  • Learning Rate: 0.0003
  • Batch Size: 4 (effective: 16 with gradient accumulation)
  • Warmup Steps: 500
  • Optimizer: AdamW with cosine LR schedule

Performance

Metric Value
Word Error Rate (WER) 20.30%
Eval Loss 0.2371
Train Loss 0.3413

Usage

Limitations

  • Trained specifically for Kikuyu language
  • Best performance on clean, clear audio
  • May struggle with heavy background noise or very fast speech

Citation

If you use this model, please cite:

License

Apache 2.0

Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for mutisya/w2v-bert-hybrid-v3-kikuyu-asr

Adapter
(1)
this model

Evaluation results