Update README.md
Browse files
README.md
CHANGED
|
@@ -24,30 +24,25 @@ The goal of this project is to create an accent classifier for people who learne
|
|
| 24 |
|
| 25 |
## How to use this model on an audio file
|
| 26 |
|
|
|
|
| 27 |
from huggingface_hub import notebook_login
|
| 28 |
-
|
| 29 |
notebook_login()
|
| 30 |
|
| 31 |
from transformers import pipeline
|
| 32 |
-
|
| 33 |
pipe = pipeline("audio-classification", model="kaysrubio/accent-id-distilhubert-finetuned-l2-arctic2")
|
| 34 |
|
| 35 |
import torch
|
| 36 |
-
|
| 37 |
import torchaudio
|
| 38 |
|
| 39 |
audio, sr = torchaudio.load('path_to_file/audio.wav') # Load audio, make sure it is mono, not stereo
|
| 40 |
-
|
| 41 |
audio = torchaudio.transforms.Resample(orig_freq=sr, new_freq=16000)(audio)
|
| 42 |
-
|
| 43 |
audio = audio.squeeze().numpy()
|
| 44 |
|
| 45 |
result = pipe(audio, top_k=6)
|
| 46 |
|
| 47 |
print(result)
|
| 48 |
-
|
| 49 |
print('First language of this speaker is predicted to be ' + result[0]['label'] + ' with ' + str(result[0]['score']*100) + '% confidence')
|
| 50 |
-
|
| 51 |
## Intended uses & limitations
|
| 52 |
|
| 53 |
The model is very accurate for novel recordings from the original dataset that were not used for train/test. However, the model is not accurate for voices from outside the dataset. Unfortunetely with only 24 speakers represented, it seems like the model memorized other characteristics of these voices besides accent, thus not creating a model very generalizable to the real world.
|
|
|
|
| 24 |
|
| 25 |
## How to use this model on an audio file
|
| 26 |
|
| 27 |
+
```
|
| 28 |
from huggingface_hub import notebook_login
|
|
|
|
| 29 |
notebook_login()
|
| 30 |
|
| 31 |
from transformers import pipeline
|
|
|
|
| 32 |
pipe = pipeline("audio-classification", model="kaysrubio/accent-id-distilhubert-finetuned-l2-arctic2")
|
| 33 |
|
| 34 |
import torch
|
|
|
|
| 35 |
import torchaudio
|
| 36 |
|
| 37 |
audio, sr = torchaudio.load('path_to_file/audio.wav') # Load audio, make sure it is mono, not stereo
|
|
|
|
| 38 |
audio = torchaudio.transforms.Resample(orig_freq=sr, new_freq=16000)(audio)
|
|
|
|
| 39 |
audio = audio.squeeze().numpy()
|
| 40 |
|
| 41 |
result = pipe(audio, top_k=6)
|
| 42 |
|
| 43 |
print(result)
|
|
|
|
| 44 |
print('First language of this speaker is predicted to be ' + result[0]['label'] + ' with ' + str(result[0]['score']*100) + '% confidence')
|
| 45 |
+
```
|
| 46 |
## Intended uses & limitations
|
| 47 |
|
| 48 |
The model is very accurate for novel recordings from the original dataset that were not used for train/test. However, the model is not accurate for voices from outside the dataset. Unfortunetely with only 24 speakers represented, it seems like the model memorized other characteristics of these voices besides accent, thus not creating a model very generalizable to the real world.
|