Phi-4 GPTQ (4-bit Quantized)
Model Description
This is a 4-bit GPTQ-quantized version of the Phi-4 transformer model, optimized for efficient inference while maintaining performance.
- Base Model: Phi-4
- Quantization: GPTQ (4-bit)
- Format:
safetensors - Tokenizer: Uses standard
vocab.jsonandmerges.txt
Intended Use
- Fast inference with minimal VRAM usage
- Deployment in resource-constrained environments
- Optimized for low-latency text generation
Model Details
| Attribute | Value |
|---|---|
| Model Name | Phi-4 GPTQ |
| Quantization | 4-bit (GPTQ) |
| File Format | .safetensors |
| Tokenizer | phi-4-tokenizer.json |
| VRAM Usage | ~X GB (depending on batch size) |
- Downloads last month
- 2,924
Model tree for fhamborg/phi-4-4bit-gptq
Base model
microsoft/phi-4