snorTTS-Indic-v0-AWQ-W4A16

This is a quantized version of snorbyte/snorTTS-Indic-v0 using AWQ (Activation-aware Weight Quantization) with W4A16 precision.

Quantization Details

Parameter Value
Method AWQ (Activation-aware Weight Quantization)
Weight Precision 4-bit
Activation Precision 16-bit
Format compressed-tensors
Quantization Tool llmcompressor
Model Size Reduction ~75%
Calibration Samples 512
Calibration Dataset snorbyte/indic-tts-sample-snac-encoded

Model Overview

  • Architecture: LLaMA-3.2-3B
  • Base Model: canopylabs/3b-hi-pretrain-research_release
  • Audio Codec: SNAC @ 24 kHz, 3 codebooks
  • Languages: Hindi, Gujarati, Marathi, Punjabi, Bengali, Telugu, Kannada, Malayalam, Tamil

Performance Comparison

Metric Original Model This Model (AWQ)
Model Size ~8GB ~3.5GB (60% reduction)
Inference Speed Baseline Faster (4-bit computation)
Memory Usage High Low
Audio Quality Reference Minimal degradation

Usage

With vLLM (Recommended for Production)

How to Run it

docker run \
--runtime nvidia \
--gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v ~/.cache/vllm:/root/.cache/vllm \
-v ~/snor-quant:/models \
-p 8002:8002 \
--env "HF_HUB_ENABLE_HF_TRANSFER=1" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
--env "HF_HUB_OFFLINE=1" \
--ipc=host \
--shm-size 32g \
--log-opt max-size=10m \
--log-opt max-file=3 \
vllm/vllm-openai:latest \
--port 8002 \
--model "/models/snorTTS-Indic-v0-AWQ-W4A16" \
--served-model-name llm \
--host 0.0.0.0 \
--max-model-len 2048 \
--max-num-seqs 5 \
--gpu-memory-utilization 0.20 \
--dtype auto \
--quantization compressed-tensors \
--trust-remote-code \
--uvicorn-log-level info
Downloads last month
33
Safetensors
Model size
1B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for devnagriai/snorTTS-Indic-v0-AWQ-W4A16

Unable to build the model tree, the base model loops to the model itself. Learn more.

Dataset used to train devnagriai/snorTTS-Indic-v0-AWQ-W4A16