atharva-again commited on
Commit
98b21f6
·
verified ·
1 Parent(s): 0b1233d

docs: update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -3
README.md CHANGED
@@ -1,3 +1,107 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - as
5
+ - bn
6
+ - brx
7
+ - doi
8
+ - gu
9
+ - hi
10
+ - kn
11
+ - kok
12
+ - mai
13
+ - ml
14
+ - mr
15
+ - ne
16
+ - or
17
+ - pa
18
+ - sa
19
+ - sat
20
+ - sd
21
+ - ta
22
+ - te
23
+ - ur
24
+ base_model:
25
+ - ai4bharat/indic-conformer-600m-multilingual
26
+ pipeline_tag: automatic-speech-recognition
27
+ ---
28
+
29
+ # Indic Conformer 600M Quantized
30
+
31
+ This repository contains a quantized version of the Indic Conformer model, a large-scale automatic speech recognition (ASR) model created for Indic languages by AI4Bharat. The original model can be found [here](https://huggingface.co/ai4bharat/indic-conformer-600m-multilingual)
32
+
33
+ ## Model Details
34
+
35
+ - **Model Type**: Automatic Speech Recognition (ASR)
36
+ - **Architecture**: Conformer with both CTC (Connectionist Temporal Classification) and RNNT (Recurrent Neural Network Transducer) decoder
37
+ - **Quantization**: int8 quantization for reduced model size and faster inference
38
+ - **Parameters**: Approximately 600 million parameters
39
+ - **Languages Supported**: Assamese (as), Bengali (bn), Bodo (brx), Dogri (doi), Gujarati (gu), Hindi (hi), Kannada (kn), Konkani (kok), Maithili (mai), Malayalam (ml), Marathi (mr), Nepali (ne), Odia (or), Punjabi (pa), Sanskrit (sa), Santali (sat), Sindhi (sd), Tamil (ta), Telugu (te), Urdu (ur)
40
+
41
+ ## Intended Use
42
+
43
+ This model is intended for transcribing speech in Indic languages into text. It can be used for applications such as voice assistants, transcription services, and accessibility tools.
44
+
45
+ ## Usage
46
+
47
+ [![Open in Kaggle](https://img.shields.io/badge/Open%20in-Kaggle-blue?logo=kaggle)](https://www.kaggle.com/code/haposeiz/using-indic-asr-quantized)
48
+
49
+ ### Installation
50
+
51
+ To use this model, simply install the helper package:
52
+
53
+ ```bash
54
+ pip install indic-asr-onnx
55
+ ```
56
+
57
+ ### Loading the Model
58
+
59
+ ```python
60
+ from indic_asr_onnx import IndicTranscriber
61
+
62
+ # Initialize (downloads model automatically)
63
+ transcriber = IndicTranscriber()
64
+ ```
65
+
66
+ ### Inference
67
+
68
+ ```python
69
+ # Transcribe audio using CTC head
70
+ text = transcriber.transcribe_ctc("audio.wav", "hi") # Hindi
71
+ print(text)
72
+
73
+ # Transcribe audio using RNNT head
74
+ text = transcriber.transcribe_rnnt("audio.wav", "hi") # Hindi
75
+ print(text)
76
+ ```
77
+
78
+ ## Model Files
79
+
80
+ ### Config Sunfolder
81
+ - `config.json`: Model configuration including architecture details, quantization settings, and RNN-T parameters
82
+ - `vocab.json`: Subword vocabulary for supported languages
83
+ - `preprocessor.json`: Preprocessor configuration for audio feature extraction
84
+ - `language_masks.json`: Language-specific masks for handling multilingual inputs
85
+
86
+ ### ONNX Subfolder
87
+ - `ctc_decoder_quantized_int8.onnx`: Quantized CTC decoder for connectionist temporal classification
88
+ - `encoder_quantized_int8.onnx`: Quantized Conformer encoder for feature extraction from audio
89
+ - `joint_enc_quantized_int8.onnx`: Quantized joint encoder component for RNN-T decoding
90
+ - `joint_pre_net_quantized_int8.onnx`: Quantized joint pre-net for preprocessing in RNN-T
91
+ - `joint_pred_quantized_int8.onnx`: Quantized joint predictor for RNN-T decoding
92
+ - `rnnt_decoder_quantized_int8.onnx`: Quantized RNN-T decoder for recurrent neural network transducer
93
+ - `adapters/*`: Language-specific quantized joint post-net adapters for each supported language (e.g., joint_post_net_hi_quantized_int8.onnx for Hindi)
94
+
95
+ ## Training Data
96
+
97
+ The model was quantized using a Calibration Dataset that can be found [here](https://www.kaggle.com/datasets/haposeiz/indicvoices-calibration-1408).
98
+
99
+ The Calibration Dataset was curated from the [Indic Voices Dataset](https://huggingface.co/datasets/ai4bharat/IndicVoices).
100
+
101
+ ## Additional Links
102
+
103
+ - GitHub: https://github.com/atharva-again/indic-asr-onnx
104
+
105
+ ## Contact
106
+
107
+ For questions or issues, you can either open an issue on this repository, on GitHub, or email me at [email protected].