Update README.md
Browse files
README.md
CHANGED
|
@@ -40,13 +40,13 @@ base_model:
|
|
| 40 |
|
| 41 |
This is a HuBERT Base model pre-trained using 1,778 hours of Catalan speech data.
|
| 42 |
The model architecture is the same as the [original HuBERT Base model](https://huggingface.co/facebook/hubert-base-ls960), which contains 12 transformer layers.
|
| 43 |
-
Pre-training was done by [Barcelona Supercomputing Center](https://bsc.es/)
|
| 44 |
|
| 45 |
# 2-Intended Uses and Limitations
|
| 46 |
|
| 47 |
-
This pre-trained model generates
|
| 48 |
This model does not have a tokenizer as it was pretrained on audio alone.
|
| 49 |
-
In order to use this model for Speech Recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data.
|
| 50 |
Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model for Speech Recognition.
|
| 51 |
For an explanation of how to fine-tune the model for Audio Classification, check out [this tutorial](https://huggingface.co/docs/transformers/main/en/tasks/audio_classification).
|
| 52 |
|
|
@@ -64,13 +64,13 @@ For pre-training, a 1,778 hours dataset was created using subsets from training
|
|
| 64 |
|
| 65 |
# 4-Indirect evaluation results
|
| 66 |
|
| 67 |
-
To assess the pre-trained Catalan Speech Representations' quality, we evaluated them using two indirect tasks: Catalan Speech Recognition and Catalan Accent Classification.
|
| 68 |
|
| 69 |
-
## 4.1
|
| 70 |
|
| 71 |
COMPLETAR
|
| 72 |
|
| 73 |
-
## 4.2
|
| 74 |
|
| 75 |
COMPLETAR
|
| 76 |
|
|
@@ -78,6 +78,7 @@ COMPLETAR
|
|
| 78 |
# 5-How to use the model
|
| 79 |
|
| 80 |
## 5.1-Speech Representations
|
|
|
|
| 81 |
To obtain Speech Representations (HuBERT outputs) from audio in Catalan using this model, you can follow this example:
|
| 82 |
|
| 83 |
```python
|
|
@@ -130,8 +131,9 @@ def map_to_speech_representations(batch):
|
|
| 130 |
speech_representations = dataset.map(map_to_speech_representations)
|
| 131 |
```
|
| 132 |
|
| 133 |
-
## 5.
|
| 134 |
-
|
|
|
|
| 135 |
|
| 136 |
```python
|
| 137 |
|
|
@@ -197,6 +199,15 @@ discrete_units = dataset.map(map_to_discrete_units)
|
|
| 197 |
|
| 198 |
```
|
| 199 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
# 6-Citation
|
| 201 |
|
| 202 |
#TODO arreglar esto
|
|
|
|
| 40 |
|
| 41 |
This is a HuBERT Base model pre-trained using 1,778 hours of Catalan speech data.
|
| 42 |
The model architecture is the same as the [original HuBERT Base model](https://huggingface.co/facebook/hubert-base-ls960), which contains 12 transformer layers.
|
| 43 |
+
Pre-training was done by [Barcelona Supercomputing Center](https://bsc.es/).
|
| 44 |
|
| 45 |
# 2-Intended Uses and Limitations
|
| 46 |
|
| 47 |
+
This pre-trained model generates Speech Representations that can be used for any Catalan speech-related task.
|
| 48 |
This model does not have a tokenizer as it was pretrained on audio alone.
|
| 49 |
+
In order to use this model for Automatic Speech Recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data.
|
| 50 |
Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model for Speech Recognition.
|
| 51 |
For an explanation of how to fine-tune the model for Audio Classification, check out [this tutorial](https://huggingface.co/docs/transformers/main/en/tasks/audio_classification).
|
| 52 |
|
|
|
|
| 64 |
|
| 65 |
# 4-Indirect evaluation results
|
| 66 |
|
| 67 |
+
To assess the pre-trained Catalan Speech Representations' quality, we evaluated them using two indirect tasks: Catalan Automatic Speech Recognition (ASR) and Catalan Accent Classification.
|
| 68 |
|
| 69 |
+
## 4.1-Catalan Automatic Speech Recognition
|
| 70 |
|
| 71 |
COMPLETAR
|
| 72 |
|
| 73 |
+
## 4.2-Catalan Accent Classification
|
| 74 |
|
| 75 |
COMPLETAR
|
| 76 |
|
|
|
|
| 78 |
# 5-How to use the model
|
| 79 |
|
| 80 |
## 5.1-Speech Representations
|
| 81 |
+
|
| 82 |
To obtain Speech Representations (HuBERT outputs) from audio in Catalan using this model, you can follow this example:
|
| 83 |
|
| 84 |
```python
|
|
|
|
| 131 |
speech_representations = dataset.map(map_to_speech_representations)
|
| 132 |
```
|
| 133 |
|
| 134 |
+
## 5.2-Discrete Speech Representations
|
| 135 |
+
|
| 136 |
+
To obtain Discrete Speech Representations (HuBERT's k-means centroids) from audio in Catalan using this model, you can follow this example:
|
| 137 |
|
| 138 |
```python
|
| 139 |
|
|
|
|
| 199 |
|
| 200 |
```
|
| 201 |
|
| 202 |
+
## 5.3-Automatic Speech Recognition
|
| 203 |
+
|
| 204 |
+
In order to use this model for Speech Recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data.
|
| 205 |
+
Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model for Speech Recognition.
|
| 206 |
+
|
| 207 |
+
## 5.4-Audio Classification
|
| 208 |
+
|
| 209 |
+
For an explanation of how to fine-tune the model for Audio Classification, check out [this tutorial](https://huggingface.co/docs/transformers/main/en/tasks/audio_classification).
|
| 210 |
+
|
| 211 |
# 6-Citation
|
| 212 |
|
| 213 |
#TODO arreglar esto
|