BSC-LT
/

hubert-base-ca-2k

@@ -40,13 +40,13 @@ base_model:
 This is a HuBERT Base model pre-trained using 1,778 hours of Catalan speech data.
 The model architecture is the same as the [original HuBERT Base model](https://huggingface.co/facebook/hubert-base-ls960), which contains 12 transformer layers.
-Pre-training was done by [Barcelona Supercomputing Center](https://bsc.es/)
 # 2-Intended Uses and Limitations
-This pre-trained model generates rich Speech Representations that can be used for any Catalan speech-related task.
 This model does not have a tokenizer as it was pretrained on audio alone.
-In order to use this model for Speech Recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data.
 Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model for Speech Recognition.
 For an explanation of how to fine-tune the model for Audio Classification, check out [this tutorial](https://huggingface.co/docs/transformers/main/en/tasks/audio_classification).
@@ -64,13 +64,13 @@ For pre-training, a 1,778 hours dataset was created using subsets from training
 # 4-Indirect evaluation results
-To assess the pre-trained Catalan Speech Representations' quality, we evaluated them using two indirect tasks: Catalan Speech Recognition and Catalan Accent Classification.
-## 4.1 - Catalan Speech Recognition
 COMPLETAR
-## 4.2 - Catalan Accent Classification
 COMPLETAR
@@ -78,6 +78,7 @@ COMPLETAR
 # 5-How to use the model
 ## 5.1-Speech Representations
 To obtain Speech Representations (HuBERT outputs) from audio in Catalan using this model, you can follow this example:
 ```python
@@ -130,8 +131,9 @@ def map_to_speech_representations(batch):
 speech_representations = dataset.map(map_to_speech_representations)
 ```
-## 5.1-Discrete Speech Representations
-To obtain Discrete Speech Representations (HuBERT's k-means outputs) from audio in Catalan using this model, you can follow this example:
 ```python
@@ -197,6 +199,15 @@ discrete_units = dataset.map(map_to_discrete_units)
 ```
 # 6-Citation
 #TODO arreglar esto

 This is a HuBERT Base model pre-trained using 1,778 hours of Catalan speech data.
 The model architecture is the same as the [original HuBERT Base model](https://huggingface.co/facebook/hubert-base-ls960), which contains 12 transformer layers.
+Pre-training was done by [Barcelona Supercomputing Center](https://bsc.es/).
 # 2-Intended Uses and Limitations
+This pre-trained model generates Speech Representations that can be used for any Catalan speech-related task.
 This model does not have a tokenizer as it was pretrained on audio alone.
+In order to use this model for Automatic Speech Recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data.
 Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model for Speech Recognition.
 For an explanation of how to fine-tune the model for Audio Classification, check out [this tutorial](https://huggingface.co/docs/transformers/main/en/tasks/audio_classification).
 # 4-Indirect evaluation results
+To assess the pre-trained Catalan Speech Representations' quality, we evaluated them using two indirect tasks: Catalan Automatic Speech Recognition (ASR) and Catalan Accent Classification.
+## 4.1-Catalan Automatic Speech Recognition
 COMPLETAR
+## 4.2-Catalan Accent Classification
 COMPLETAR
 # 5-How to use the model
 ## 5.1-Speech Representations
 To obtain Speech Representations (HuBERT outputs) from audio in Catalan using this model, you can follow this example:
 ```python
 speech_representations = dataset.map(map_to_speech_representations)
 ```
+## 5.2-Discrete Speech Representations
+To obtain Discrete Speech Representations (HuBERT's k-means centroids) from audio in Catalan using this model, you can follow this example:
 ```python
 ```
+## 5.3-Automatic Speech Recognition
+In order to use this model for Speech Recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data.
+Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model for Speech Recognition.
+## 5.4-Audio Classification
+For an explanation of how to fine-tune the model for Audio Classification, check out [this tutorial](https://huggingface.co/docs/transformers/main/en/tasks/audio_classification).
 # 6-Citation
 #TODO arreglar esto