snorTTS-Indic-v0-W8A8-Conservative

This is a W8A8 INT8 quantized version of snorbyte/snorTTS-Indic-v0, a multilingual Indic Text-to-Speech (TTS) model capable of generating speech in nine Indic languages : Hindi, Tamil, Telugu, Marathi, Kannada, Malayalam, Punjabi, Gujarati, and Bengali. As well as code switching also supported. You can combine any language and any speaker.

Quantization Details

  • Quantization Method: W8A8 INT8 (8-bit weights, 8-bit activations)
  • Quantization Framework: llmcompressor
  • Quantization Scheme: Dynamic per-token activation quantization with GPTQ
  • Calibration Dataset: snorbyte/indic-tts-sample-snac-encoded (512 samples)
  • Format: Compressed-tensors (compatible with vLLM)

Model Comparison

Metric Original FP16 W8A8 Quantized (This Model)
Model Size ~9.5 GB ~4.75 GB
Memory Usage ~10-12 GB GPU ~5 GB GPU
Inference Speed 1x (baseline) 1.5-2x faster
Audio Quality 100% (baseline) >98% similar
Compatibility Standard PyTorch/Unsloth vLLM, llmcompressor

Key Features

โœ… 50% Model Size Reduction - From 9.5GB to 4.7GB
โœ… 1.5-2x Faster Inference - Optimized for production deployment
โœ… Minimal Quality Loss - >98% audio quality preservation
โœ… vLLM Compatible - Ready for high-throughput serving
โœ… Conservative Quantization - Preserves embedding layers for better TTS quality

Capabilities

  • Text-to-Speech (TTS)
  • Voice Cloning
  • Code Switching
  • Cross-lingual Voice Cloning (Multilingual Voice Transfer)

Supported Languages

Hindi, Gujarati, Marathi, Punjabi, Bengali, Telugu, Kannada, Malayalam, Tamil

Installation

How To Run It

docker run \
--runtime nvidia \
--gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v ~/.cache/vllm:/root/.cache/vllm \
-v ~/snor-quant:/models \
-p 8002:8002 \
--env "HF_HUB_ENABLE_HF_TRANSFER=1" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
--env "HF_HUB_OFFLINE=1" \
--ipc=host \
--shm-size 32g \
--log-opt max-size=10m \
--log-opt max-file=3 \
vllm/vllm-openai:latest \
--port 8002 \
--model "/models/snorTTS-Indic-v0-W8A8-Conservative" \
--served-model-name llm \
--host 0.0.0.0 \
--max-model-len 2048 \
--max-num-seqs 5 \
--gpu-memory-utilization 0.25 \
--dtype auto \
--quantization compressed-tensors \
--uvicorn-log-level info
Downloads last month
72
Safetensors
Model size
4B params
Tensor type
BF16
ยท
I8
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for devnagriai/snorTTS-Indic-v0-INT8-W8A8

Unable to build the model tree, the base model loops to the model itself. Learn more.

Dataset used to train devnagriai/snorTTS-Indic-v0-INT8-W8A8