docs: update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,107 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- as
|
| 5 |
+
- bn
|
| 6 |
+
- brx
|
| 7 |
+
- doi
|
| 8 |
+
- gu
|
| 9 |
+
- hi
|
| 10 |
+
- kn
|
| 11 |
+
- kok
|
| 12 |
+
- mai
|
| 13 |
+
- ml
|
| 14 |
+
- mr
|
| 15 |
+
- ne
|
| 16 |
+
- or
|
| 17 |
+
- pa
|
| 18 |
+
- sa
|
| 19 |
+
- sat
|
| 20 |
+
- sd
|
| 21 |
+
- ta
|
| 22 |
+
- te
|
| 23 |
+
- ur
|
| 24 |
+
base_model:
|
| 25 |
+
- ai4bharat/indic-conformer-600m-multilingual
|
| 26 |
+
pipeline_tag: automatic-speech-recognition
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
# Indic Conformer 600M Quantized
|
| 30 |
+
|
| 31 |
+
This repository contains a quantized version of the Indic Conformer model, a large-scale automatic speech recognition (ASR) model created for Indic languages by AI4Bharat. The original model can be found [here](https://huggingface.co/ai4bharat/indic-conformer-600m-multilingual)
|
| 32 |
+
|
| 33 |
+
## Model Details
|
| 34 |
+
|
| 35 |
+
- **Model Type**: Automatic Speech Recognition (ASR)
|
| 36 |
+
- **Architecture**: Conformer with both CTC (Connectionist Temporal Classification) and RNNT (Recurrent Neural Network Transducer) decoder
|
| 37 |
+
- **Quantization**: int8 quantization for reduced model size and faster inference
|
| 38 |
+
- **Parameters**: Approximately 600 million parameters
|
| 39 |
+
- **Languages Supported**: Assamese (as), Bengali (bn), Bodo (brx), Dogri (doi), Gujarati (gu), Hindi (hi), Kannada (kn), Konkani (kok), Maithili (mai), Malayalam (ml), Marathi (mr), Nepali (ne), Odia (or), Punjabi (pa), Sanskrit (sa), Santali (sat), Sindhi (sd), Tamil (ta), Telugu (te), Urdu (ur)
|
| 40 |
+
|
| 41 |
+
## Intended Use
|
| 42 |
+
|
| 43 |
+
This model is intended for transcribing speech in Indic languages into text. It can be used for applications such as voice assistants, transcription services, and accessibility tools.
|
| 44 |
+
|
| 45 |
+
## Usage
|
| 46 |
+
|
| 47 |
+
[](https://www.kaggle.com/code/haposeiz/using-indic-asr-quantized)
|
| 48 |
+
|
| 49 |
+
### Installation
|
| 50 |
+
|
| 51 |
+
To use this model, simply install the helper package:
|
| 52 |
+
|
| 53 |
+
```bash
|
| 54 |
+
pip install indic-asr-onnx
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
### Loading the Model
|
| 58 |
+
|
| 59 |
+
```python
|
| 60 |
+
from indic_asr_onnx import IndicTranscriber
|
| 61 |
+
|
| 62 |
+
# Initialize (downloads model automatically)
|
| 63 |
+
transcriber = IndicTranscriber()
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
### Inference
|
| 67 |
+
|
| 68 |
+
```python
|
| 69 |
+
# Transcribe audio using CTC head
|
| 70 |
+
text = transcriber.transcribe_ctc("audio.wav", "hi") # Hindi
|
| 71 |
+
print(text)
|
| 72 |
+
|
| 73 |
+
# Transcribe audio using RNNT head
|
| 74 |
+
text = transcriber.transcribe_rnnt("audio.wav", "hi") # Hindi
|
| 75 |
+
print(text)
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
## Model Files
|
| 79 |
+
|
| 80 |
+
### Config Sunfolder
|
| 81 |
+
- `config.json`: Model configuration including architecture details, quantization settings, and RNN-T parameters
|
| 82 |
+
- `vocab.json`: Subword vocabulary for supported languages
|
| 83 |
+
- `preprocessor.json`: Preprocessor configuration for audio feature extraction
|
| 84 |
+
- `language_masks.json`: Language-specific masks for handling multilingual inputs
|
| 85 |
+
|
| 86 |
+
### ONNX Subfolder
|
| 87 |
+
- `ctc_decoder_quantized_int8.onnx`: Quantized CTC decoder for connectionist temporal classification
|
| 88 |
+
- `encoder_quantized_int8.onnx`: Quantized Conformer encoder for feature extraction from audio
|
| 89 |
+
- `joint_enc_quantized_int8.onnx`: Quantized joint encoder component for RNN-T decoding
|
| 90 |
+
- `joint_pre_net_quantized_int8.onnx`: Quantized joint pre-net for preprocessing in RNN-T
|
| 91 |
+
- `joint_pred_quantized_int8.onnx`: Quantized joint predictor for RNN-T decoding
|
| 92 |
+
- `rnnt_decoder_quantized_int8.onnx`: Quantized RNN-T decoder for recurrent neural network transducer
|
| 93 |
+
- `adapters/*`: Language-specific quantized joint post-net adapters for each supported language (e.g., joint_post_net_hi_quantized_int8.onnx for Hindi)
|
| 94 |
+
|
| 95 |
+
## Training Data
|
| 96 |
+
|
| 97 |
+
The model was quantized using a Calibration Dataset that can be found [here](https://www.kaggle.com/datasets/haposeiz/indicvoices-calibration-1408).
|
| 98 |
+
|
| 99 |
+
The Calibration Dataset was curated from the [Indic Voices Dataset](https://huggingface.co/datasets/ai4bharat/IndicVoices).
|
| 100 |
+
|
| 101 |
+
## Additional Links
|
| 102 |
+
|
| 103 |
+
- GitHub: https://github.com/atharva-again/indic-asr-onnx
|
| 104 |
+
|
| 105 |
+
## Contact
|
| 106 |
+
|
| 107 |
+
For questions or issues, you can either open an issue on this repository, on GitHub, or email me at [email protected].
|