File size: 4,528 Bytes
5d0bfe7 3f41544 5d0bfe7 61cb4e1 5d0bfe7 83ae22d 5d0bfe7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
---
language:
- en
license: apache-2.0
tags:
- quantization
- sinq
- int4
- efficient-inference
- text-generation
- qwen
- llm
- compression
base_model: Qwen/Qwen3-1.7B
base_model_relation: quantized
---
<p align="center">
<img src="logo.png" alt="Logo" style="max-width: 80%; height: auto;">
</p>
<p align="center">π <a href="https://github.com/huawei-csl/SINQ">Github</a> | π <a href="http://arxiv.org/abs/2509.22944">Paper</a></p>
# SINQ 4-bit Quantized Qwen3-1.7B model
This repository contains the official **4-bit quantized** version of the [`Qwen3-1.7B`](https://huggingface.co/Qwen/Qwen3-1.7B) model using the **SINQ (Sinkhorn-Normalized Quantization)** method.
SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact.
To support the project please put a star β in the official [SINQ](https://github.com/huawei-csl/SINQ) github repository.
## Model Details
- **Model Name:** `Qwen3-1.7B-4bit-SINQ `
- **Base Model:** [`Qwen/Qwen3-1.7B`](https://huggingface.co/Qwen/Qwen3-1.7B)
- **Task:** Text Generation
- **Framework:** PyTorch / Transformers
- **License:** [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)
- **Quantized By:** *Huawei - Computing Systems Lab*
## Quantization Details
- **Quantization Method:** SINQ (Sinkhorn-Normalized Quantization)
- **Precision:** INT4
- **Group Size:** 64
- **Framework:** PyTorch
- **Quantization Library:** `sinq`
---
# π Usage</span>
## Prerequisite
Before running the quantization script, make sure the **SINQ** library is installed.
Installation instructions and setup details are available in the [SINQ official github repository](https://github.com/huawei-csl/SINQ).
## Usage example
You can load and use the model with our wrapper based on the π€ Transformers library:
```python
from transformers import AutoTokenizer
from sinq.patch_model import AutoSINQHFModel
model_name = "huawei-csl/Qwen3-1.7B-4bit-SINQ"
tokenizer = AutoTokenizer.from_pretrained(model_name)
sinq_model = AutoSINQHFModel.from_quantized_safetensors(
model_name,
device="cuda:0",
compute_dtype=torch.bfloat16
)
prompt = "Explain neural network quantization in one sentence."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
with torch.inference_mode():
out_ids = sinq_model.generate(**inputs, max_new_tokens=32, do_sample=False)
print(tokenizer.decode(out_ids[0], skip_special_tokens=True))
```
<details>
<summary><span style="font-size:1.1em; font-weight:bold;">π§© Quantization Process</span></summary>
The quantized model was obtained using the **SINQ** quantization library, following the steps below:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from sinq.patch_model import AutoSINQHFModel
from sinq.sinqlinear import BaseQuantizeConfig
# Load base model
base_model_name = "Qwen/Qwen3-1.7B"
model = AutoModelForCausalLM.from_pretrained(base_model_name, torch_dtype="float16")
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Apply 4-bit SINQ quantization
quant_cfg = BaseQuantizeConfig(
nbits=4, # quantization bit-width
group_size=64, # group size
tiling_mode="1D", # tiling strategy
method="sinq" # quantization method ("asinq" for the calibrated version)
)
qmodel = AutoSINQHFModel.quantize_model(
model,
tokenizer=tokenizer,
quant_config=quant_cfg,
compute_dtype=torch.bfloat16,
device="cuda:0"
)
```
> **Reproducibility Note**: This model was quantized using the SINQ implementation from commit [`14ad847`](https://github.com/huawei-csl/SINQ/commit/14ad847d0ab25f1794b8820506f59b5c9c1fc979) of the [SINQ](https://github.com/huawei-csl/SINQ) repository.
</details>
</br>
---
# π§Ύ How to Cite This Work
If you find **SINQ** useful in your research or applications, please
- Put a star β in the official [SINQ](https://github.com/huawei-csl/SINQ) github repository.
- Cite our <a href="http://arxiv.org/abs/2509.22944" target="_blank"><strong>paper</strong></a>:
```bibtex
@misc{muller2025sinq,
title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights},
author={Lorenz K. Muller and Philippe Bich and Jiawei Zhuang and Ahmet Celik and Luca Benfenati and Lukas Cavigelli},
year={2025},
eprint={2509.22944},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={http://arxiv.org/abs/2509.22944}
}
``` |