xcodec2-25TPS-24k

Improve https://huggingface.co/HKUSTAudio/xcodec2 from 50 TPS to become 25 TPS and upscale output to 24k sample rate.

WanDB at https://wandb.ai/huseinzol05/xcodec2-24k-25tps

Still on training.

Dataset

  1. https://huggingface.co/datasets/malaysia-ai/common_voice_17_0
  2. https://huggingface.co/datasets/mesolitica/Malaysian-STT-Whisper-Stage2
  3. https://huggingface.co/datasets/malaysia-ai/Multilingual-TTS, commit 2421a13e07226d96ac7009d5327d96a84672768c except cml-tts and libritts_r_filtered
  4. https://huggingface.co/datasets/mesolitica/Malaysian-Emilia-v2, only sg_podcast and malaysian_podcast

How to

Load the model,

from modeling_xcodec2 import XCodec2Model
model = XCodec2Model.from_pretrained("malaysia-ai/xcodec2-25TPS-24k")

Encode

import librosa

y, sr = librosa.load('259041.mp3', sr = 16000)
wav_tensor = torch.from_numpy(y).float().unsqueeze(0)
codes = model.encode_code(wav_tensor)

Decode

import IPython.display as ipd

ipd.Audio(model.decode_code(codes)[0, 0].cpu(), rate = 24000)

Source code

Source code at https://github.com/malaysia-ai/X-Codec-2.0-25TPS-24k

Downloads last month
62
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for malaysia-ai/xcodec2-25TPS-24k

Base model

HKUSTAudio/xcodec2
Finetuned
(5)
this model

Datasets used to train malaysia-ai/xcodec2-25TPS-24k