Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,97 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: "en"
|
| 3 |
+
inference: false
|
| 4 |
+
tags:
|
| 5 |
+
- Vocoder
|
| 6 |
+
- HiFIGAN
|
| 7 |
+
- speech-synthesis
|
| 8 |
+
- speechbrain
|
| 9 |
+
license: "apache-2.0"
|
| 10 |
+
datasets:
|
| 11 |
+
- LibriTTS
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
|
| 16 |
+
<br/><br/>
|
| 17 |
+
|
| 18 |
+
# Vocoder with HiFIGAN Unit trained on LibriTTS
|
| 19 |
+
|
| 20 |
+
This repository provides all the necessary tools for using a [scalable HiFiGAN Unit](https://arxiv.org/abs/2406.10735) vocoder trained with [LibriTTS](https://www.openslr.org/141/).
|
| 21 |
+
|
| 22 |
+
The pre-trained model take as input discrete self-supervised representations and produces a waveform as output. This is suitable for a wide range of generative tasks such as speech enhancement, separation, text-to-speech, voice cloning, etc. Please read [DASB - Discrete Audio and Speech Benchmark](https://arxiv.org/abs/2406.14294) for more information.
|
| 23 |
+
To generate the discrete self-supervised representations, we employ a K-means clustering model trained using `microsoft/wavlm-large` hidden layers ([1, 3, 7, 12, 18, 23]), with k=1000.
|
| 24 |
+
|
| 25 |
+
## Install SpeechBrain
|
| 26 |
+
|
| 27 |
+
First of all, please install tranformers and SpeechBrain with the following command:
|
| 28 |
+
|
| 29 |
+
```
|
| 30 |
+
pip install speechbrain transformers
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
Please notice that we encourage you to read our tutorials and learn more about
|
| 34 |
+
[SpeechBrain](https://speechbrain.github.io).
|
| 35 |
+
|
| 36 |
+
### Using the Vocoder with DiscreteSSL
|
| 37 |
+
|
| 38 |
+
```python
|
| 39 |
+
import torch
|
| 40 |
+
from speechbrain.lobes.models.huggingface_transformers.hubert import (HuBERT)
|
| 41 |
+
|
| 42 |
+
inputs = torch.rand([3, 2000])
|
| 43 |
+
model_hub = "facebook/hubert-large-ll60k"
|
| 44 |
+
save_path = "savedir"
|
| 45 |
+
ssl_layer_num = [7,23]
|
| 46 |
+
deduplicate =[False, True]
|
| 47 |
+
bpe_tokenizers=[None, None]
|
| 48 |
+
vocoder_repo_id = "speechbrain/hifigan-hubert-k1000-LibriTTS"
|
| 49 |
+
kmeans_dataset = "LibriSpeech"
|
| 50 |
+
num_clusters = 1000
|
| 51 |
+
ssl_model = HuBERT(model_hub, save_path,output_all_hiddens=True)
|
| 52 |
+
model = DiscreteSSL(save_path, ssl_model, vocoder_repo_id=vocoder_repo_id, kmeans_dataset=kmeans_dataset,num_clusters=num_clusters)
|
| 53 |
+
tokens, _, _ = model.encode(inputs,SSL_layers=ssl_layer_num, deduplicates=deduplicate, bpe_tokenizers=bpe_tokenizers)
|
| 54 |
+
sig = model.decode(tokens, ssl_layer_num)
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
### Standalone Vocoder Usage
|
| 60 |
+
|
| 61 |
+
```python
|
| 62 |
+
import torch
|
| 63 |
+
from speechbrain.inference.vocoders import UnitHIFIGAN
|
| 64 |
+
|
| 65 |
+
hifi_gan_unit = UnitHIFIGAN.from_hparams(source="speechbrain/hifigan-hubert-k1000-LibriTTS", savedir="pretrained_models/vocoder")
|
| 66 |
+
codes = torch.randint(0, 99, (100, 1))
|
| 67 |
+
waveform = hifi_gan_unit.decode_unit(codes)
|
| 68 |
+
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
### Inference on GPU
|
| 73 |
+
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
### Limitations
|
| 77 |
+
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
|
| 78 |
+
|
| 79 |
+
#### Referencing SpeechBrain
|
| 80 |
+
|
| 81 |
+
```
|
| 82 |
+
@misc{SB2021,
|
| 83 |
+
author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
|
| 84 |
+
title = {SpeechBrain},
|
| 85 |
+
year = {2021},
|
| 86 |
+
publisher = {GitHub},
|
| 87 |
+
journal = {GitHub repository},
|
| 88 |
+
howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
|
| 89 |
+
}
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
#### About SpeechBrain
|
| 93 |
+
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
|
| 94 |
+
|
| 95 |
+
Website: https://speechbrain.github.io/
|
| 96 |
+
|
| 97 |
+
GitHub: https://github.com/speechbrain/speechbrain
|