snorTTS-Indic-v0-W8A8-Conservative
This is a W8A8 INT8 quantized version of snorbyte/snorTTS-Indic-v0, a multilingual Indic Text-to-Speech (TTS) model capable of generating speech in nine Indic languages : Hindi, Tamil, Telugu, Marathi, Kannada, Malayalam, Punjabi, Gujarati, and Bengali. As well as code switching also supported. You can combine any language and any speaker.
Quantization Details
- Quantization Method: W8A8 INT8 (8-bit weights, 8-bit activations)
- Quantization Framework: llmcompressor
- Quantization Scheme: Dynamic per-token activation quantization with GPTQ
- Calibration Dataset: snorbyte/indic-tts-sample-snac-encoded (512 samples)
- Format: Compressed-tensors (compatible with vLLM)
Model Comparison
| Metric | Original FP16 | W8A8 Quantized (This Model) |
|---|---|---|
| Model Size | ~9.5 GB | ~4.75 GB |
| Memory Usage | ~10-12 GB GPU | ~5 GB GPU |
| Inference Speed | 1x (baseline) | 1.5-2x faster |
| Audio Quality | 100% (baseline) | >98% similar |
| Compatibility | Standard PyTorch/Unsloth | vLLM, llmcompressor |
Key Features
โ
50% Model Size Reduction - From 9.5GB to 4.7GB
โ
1.5-2x Faster Inference - Optimized for production deployment
โ
Minimal Quality Loss - >98% audio quality preservation
โ
vLLM Compatible - Ready for high-throughput serving
โ
Conservative Quantization - Preserves embedding layers for better TTS quality
Capabilities
- Text-to-Speech (TTS)
- Voice Cloning
- Code Switching
- Cross-lingual Voice Cloning (Multilingual Voice Transfer)
Supported Languages
Hindi, Gujarati, Marathi, Punjabi, Bengali, Telugu, Kannada, Malayalam, Tamil
Installation
How To Run It
docker run \
--runtime nvidia \
--gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v ~/.cache/vllm:/root/.cache/vllm \
-v ~/snor-quant:/models \
-p 8002:8002 \
--env "HF_HUB_ENABLE_HF_TRANSFER=1" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
--env "HF_HUB_OFFLINE=1" \
--ipc=host \
--shm-size 32g \
--log-opt max-size=10m \
--log-opt max-file=3 \
vllm/vllm-openai:latest \
--port 8002 \
--model "/models/snorTTS-Indic-v0-W8A8-Conservative" \
--served-model-name llm \
--host 0.0.0.0 \
--max-model-len 2048 \
--max-num-seqs 5 \
--gpu-memory-utilization 0.25 \
--dtype auto \
--quantization compressed-tensors \
--uvicorn-log-level info
- Downloads last month
- 72
Model tree for devnagriai/snorTTS-Indic-v0-INT8-W8A8
Unable to build the model tree, the base model loops to the model itself. Learn more.