Set a definition for WER
Browse files
README.md
CHANGED
|
@@ -41,6 +41,8 @@ Please install:
|
|
| 41 |
|
| 42 |
We evaluated the model against different Arabic-STT Wav2Vec models.
|
| 43 |
|
|
|
|
|
|
|
| 44 |
| | Model | [using transliteration](https://pypi.org/project/lang-trans/) | WER | Training Datasets |
|
| 45 |
|---:|:--------------------------------------|:---------------------|---------:|---------:|
|
| 46 |
| 1 | bakrianoo/sinai-voice-ar-stt | True | 0.238001 |Common Voice 6|
|
|
@@ -80,8 +82,8 @@ resamplers = { # all three sampling rates exist in test split
|
|
| 80 |
transformation = jiwer.Compose([
|
| 81 |
# normalize some diacritics, remove punctuation, and replace Persian letters with Arabic ones
|
| 82 |
jiwer.SubstituteRegexes({
|
| 83 |
-
r'[auiFNKo
|
| 84 |
-
r"[
|
| 85 |
# default transformation below
|
| 86 |
jiwer.RemoveMultipleSpaces(),
|
| 87 |
jiwer.Strip(),
|
|
@@ -274,8 +276,8 @@ test_split = test_split.map(predict, batched=True, batch_size=16, remove_columns
|
|
| 274 |
transformation = jiwer.Compose([
|
| 275 |
# normalize some diacritics, remove punctuation, and replace Persian letters with Arabic ones
|
| 276 |
jiwer.SubstituteRegexes({
|
| 277 |
-
r'[auiFNKo
|
| 278 |
-
r"[
|
| 279 |
# default transformation below
|
| 280 |
jiwer.RemoveMultipleSpaces(),
|
| 281 |
jiwer.Strip(),
|
|
@@ -293,6 +295,8 @@ print(f"WER: {metrics['wer']:.2%}")
|
|
| 293 |
```
|
| 294 |
**Test Result**: 23.80%
|
| 295 |
|
|
|
|
|
|
|
| 296 |
|
| 297 |
## Other Arabic Voice recognition Models
|
| 298 |
|
|
|
|
| 41 |
|
| 42 |
We evaluated the model against different Arabic-STT Wav2Vec models.
|
| 43 |
|
| 44 |
+
[**WER**: Word Error Rate] The Lowest score you get, the best model you have
|
| 45 |
+
|
| 46 |
| | Model | [using transliteration](https://pypi.org/project/lang-trans/) | WER | Training Datasets |
|
| 47 |
|---:|:--------------------------------------|:---------------------|---------:|---------:|
|
| 48 |
| 1 | bakrianoo/sinai-voice-ar-stt | True | 0.238001 |Common Voice 6|
|
|
|
|
| 82 |
transformation = jiwer.Compose([
|
| 83 |
# normalize some diacritics, remove punctuation, and replace Persian letters with Arabic ones
|
| 84 |
jiwer.SubstituteRegexes({
|
| 85 |
+
r'[auiFNKo\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~_،؟»\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\?;:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-,\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.؛«!"]': "", "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\u06D6": "",
|
| 86 |
+
r"[\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\|\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\{]": "A", "p": "h", "ک": "k", "ی": "y"}),
|
| 87 |
# default transformation below
|
| 88 |
jiwer.RemoveMultipleSpaces(),
|
| 89 |
jiwer.Strip(),
|
|
|
|
| 276 |
transformation = jiwer.Compose([
|
| 277 |
# normalize some diacritics, remove punctuation, and replace Persian letters with Arabic ones
|
| 278 |
jiwer.SubstituteRegexes({
|
| 279 |
+
r'[auiFNKo\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~_،؟»\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\?;:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-,\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.؛«!"]': "", "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\u06D6": "",
|
| 280 |
+
r"[\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\|\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\{]": "A", "p": "h", "ک": "k", "ی": "y"}),
|
| 281 |
# default transformation below
|
| 282 |
jiwer.RemoveMultipleSpaces(),
|
| 283 |
jiwer.Strip(),
|
|
|
|
| 295 |
```
|
| 296 |
**Test Result**: 23.80%
|
| 297 |
|
| 298 |
+
[**WER**: Word Error Rate] The Lowest score you get, the best model you have
|
| 299 |
+
|
| 300 |
|
| 301 |
## Other Arabic Voice recognition Models
|
| 302 |
|