--- library_name: nemo license: cc-by-4.0 tags: - pytorch - NeMo base_model: - nvidia/stt_rw_conformer_transducer_large --- # Stt Rw Conformer Transducer Large [![Model architecture](https://img.shields.io/badge/Model_Arch-Conformer_Transducer-lightgrey#model-badge)](#model-architecture) | [![Model size](https://img.shields.io/badge/Params-120-lightgrey#model-badge)](#model-architecture) | [![Language](https://img.shields.io/badge/Language-rw-lightgrey#model-badge)](#datasets) This model is a finetuned version [nvidia/stt_rw_conformer_transducer_large](https://huggingface.co/nvidia/stt_rw_conformer_transducer_large). It was finetuned on [Mozilla Common Voice 22](https://commonvoice.mozilla.org/en/datasets) and [Digital Umuganda track A](https://www.kaggle.com/datasets/digitalumuganda/track-a-kinyarwanda-asr-dataset) datasets containing about 2000 hours and 500 hours of kinyarwanda speech respectively. See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/index.html) for complete architecture details. ## NVIDIA NeMo: Training To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version. ``` pip install nemo_toolkit['asr'] ``` ## How to Use this Model The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. ### Automatically instantiate the model ```python from nemo.collections.asr.models import EncDecRNNTBPEModel asr_model = EncDecRNNTBPEModel.from_pretrained("WakandaAI/stt_rw_conformer_transducer_large") ``` ### Transcribing using Python Then simply do: ``` output = asr_model.transcribe(['sample.wav']) print(output[0].text) ``` ### Input This model accepts 16 kHz mono-channel Audio (wav files) as input. ### Output This model provides transcribed speech as a string for a given audio sample. ## Model Architecture [Conformer-Transducer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html) ## Training Using the pretrained model [nvidia/stt_rw_conformer_transducer_large](https://huggingface.co/nvidia/stt_rw_conformer_transducer_large), this model was finetuned using the MCV 22 and Digital Umuganda track A datasets, and evaluated on the dev and test splits. ### Datasets [Mozilla Common Voice 22 (rw)](https://commonvoice.mozilla.org/en/datasets) [Digital Umuganda track A](https://www.kaggle.com/datasets/digitalumuganda/track-a-kinyarwanda-asr-dataset) Preprocessing was done by converting to lowercase characters and removing all punctuations except the apostrophe. Some instances in the MCV dataset used backticks for apostrophe and this was accounted for in the preprocessing. This was done using ```python re.sub(r"[^\w\s']", "", x.strip().lower().replace("`", "'").replace("’", "'")) ``` ## Performance | Dataset | Split | Model | WER | CER | |---------|-------|---------------------------------------------|-------|-------| | **MCV 22** | **DEV** | WakandaAI/stt_rw_conformer_transducer_large | **14.24** | **4.31** | | | | Nvidia/stt_rw_conformer_transducer_large | 14.30 | 4.47 | |||||| | **MCV 22** | **TEST** | WakandaAI/stt_rw_conformer_transducer_large | **16.35** | **5.29** | | | | Nvidia/stt_rw_conformer_transducer_large | 16.71 | 5.74 | | | | | | | | **DU** | **DEV** | WakandaAI/stt_rw_conformer_transducer_large | **25.03** | **4.78** | | | | Nvidia/stt_rw_conformer_transducer_large | 29.86 | 6.59 | MCV 22 - Mozilla Common Voice Version 22 DU - Digital Umuganda ## Limitations Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech. ## License License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license. ## References [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)