snorTTS-Indic-v0-W8A8-Conservative

This is a W8A8 INT8 quantized version of snorbyte/snorTTS-Indic-v0, a multilingual Indic Text-to-Speech (TTS) model capable of generating speech in nine Indic languages : Hindi, Tamil, Telugu, Marathi, Kannada, Malayalam, Punjabi, Gujarati, and Bengali. As well as code switching also supported. You can combine any language and any speaker.

Quantization Details

Quantization Method: W8A8 INT8 (8-bit weights, 8-bit activations)
Quantization Framework: llmcompressor
Quantization Scheme: Dynamic per-token activation quantization with GPTQ
Calibration Dataset: snorbyte/indic-tts-sample-snac-encoded (512 samples)
Format: Compressed-tensors (compatible with vLLM)

Model Comparison

Metric	Original FP16	W8A8 Quantized (This Model)
Model Size	~9.5 GB	~4.75 GB
Memory Usage	~10-12 GB GPU	~5 GB GPU
Inference Speed	1x (baseline)	1.5-2x faster
Audio Quality	100% (baseline)	>98% similar
Compatibility	Standard PyTorch/Unsloth	vLLM, llmcompressor

Key Features

✅ 50% Model Size Reduction - From 9.5GB to 4.7GB
✅ 1.5-2x Faster Inference - Optimized for production deployment
✅ Minimal Quality Loss - >98% audio quality preservation
✅ vLLM Compatible - Ready for high-throughput serving
✅ Conservative Quantization - Preserves embedding layers for better TTS quality

Capabilities

Text-to-Speech (TTS)
Voice Cloning
Code Switching
Cross-lingual Voice Cloning (Multilingual Voice Transfer)

Supported Languages

Hindi, Gujarati, Marathi, Punjabi, Bengali, Telugu, Kannada, Malayalam, Tamil

Installation

How To Run It

docker run \
--runtime nvidia \
--gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v ~/.cache/vllm:/root/.cache/vllm \
-v ~/snor-quant:/models \
-p 8002:8002 \
--env "HF_HUB_ENABLE_HF_TRANSFER=1" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
--env "HF_HUB_OFFLINE=1" \
--ipc=host \
--shm-size 32g \
--log-opt max-size=10m \
--log-opt max-file=3 \
vllm/vllm-openai:latest \
--port 8002 \
--model "/models/snorTTS-Indic-v0-W8A8-Conservative" \
--served-model-name llm \
--host 0.0.0.0 \
--max-model-len 2048 \
--max-num-seqs 5 \
--gpu-memory-utilization 0.25 \
--dtype auto \
--quantization compressed-tensors \
--uvicorn-log-level info

Downloads last month: 72

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for devnagriai/snorTTS-Indic-v0-INT8-W8A8

Unable to build the model tree, the base model loops to the model itself. Learn more.

devnagriai
/

snorTTS-Indic-v0-INT8-W8A8