kaysrubio
/

accent-id-distilhubert-finetuned-l2-arctic2

Audio Classification

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

kaysrubio commited on Mar 7

Commit

0f22783

·

verified ·

1 Parent(s): 3662cdf

Update README.md

Files changed (1) hide show

README.md +2 -7

README.md CHANGED Viewed

@@ -24,30 +24,25 @@ The goal of this project is to create an accent classifier for people who learne
 ## How to use this model on an audio file
 from huggingface_hub import notebook_login
 notebook_login()
 from transformers import pipeline
 pipe = pipeline("audio-classification", model="kaysrubio/accent-id-distilhubert-finetuned-l2-arctic2")
 import torch
 import torchaudio
 audio, sr = torchaudio.load('path_to_file/audio.wav')  # Load audio, make sure it is mono, not stereo
 audio = torchaudio.transforms.Resample(orig_freq=sr, new_freq=16000)(audio)
 audio = audio.squeeze().numpy()
 result = pipe(audio, top_k=6)
 print(result)
 print('First language of this speaker is predicted to be ' + result[0]['label'] + ' with ' + str(result[0]['score']*100) + '% confidence')
 ## Intended uses & limitations
 The model is very accurate for novel recordings from the original dataset that were not used for train/test. However, the model is not accurate for voices from outside the dataset.  Unfortunetely with only 24 speakers represented, it seems like the model memorized other characteristics of these voices besides accent, thus not creating a model very generalizable to the real world.

 ## How to use this model on an audio file
+```
 from huggingface_hub import notebook_login
 notebook_login()
 from transformers import pipeline
 pipe = pipeline("audio-classification", model="kaysrubio/accent-id-distilhubert-finetuned-l2-arctic2")
 import torch
 import torchaudio
 audio, sr = torchaudio.load('path_to_file/audio.wav')  # Load audio, make sure it is mono, not stereo
 audio = torchaudio.transforms.Resample(orig_freq=sr, new_freq=16000)(audio)
 audio = audio.squeeze().numpy()
 result = pipe(audio, top_k=6)
 print(result)
 print('First language of this speaker is predicted to be ' + result[0]['label'] + ' with ' + str(result[0]['score']*100) + '% confidence')
+```
 ## Intended uses & limitations
 The model is very accurate for novel recordings from the original dataset that were not used for train/test. However, the model is not accurate for voices from outside the dataset.  Unfortunetely with only 24 speakers represented, it seems like the model memorized other characteristics of these voices besides accent, thus not creating a model very generalizable to the real world.