OPEA
/

gemma-3-12b-it-int4-AutoRound

4-bit precision

Model card Files Files and versions

cicdatopea commited on Apr 24

Commit

764dade

·

verified ·

1 Parent(s): ec01bb8

Update README.md

Files changed (1) hide show

README.md +4 -9

README.md CHANGED Viewed

@@ -10,16 +10,12 @@ This model is an int4 model with group_size 128 and symmetric quantization of [g
 Please follow the license of the original model.
-### Inference on CPU
-we found the unquantized layer must run on BF16 or FP32, so cuda inference is not available now.
 Requirements
 ```bash
-pip install auto-round
-pip uninstall intel-extension-for-pytorch
-pip install intel-extension-for-transformers
 ```
 ~~~python
@@ -27,13 +23,12 @@ from transformers import AutoProcessor, Gemma3ForConditionalGeneration
 from PIL import Image
 import requests
 import torch
-from auto_round import AutoRoundConfig
 model_id = "OPEA/gemma-3-12b-it-int4-AutoRound"
 model = Gemma3ForConditionalGeneration.from_pretrained(
-    model_id, torch_dtype=torch.bfloat16, device_map="auto", quantization_config=quantization_config
-).eval()
 processor = AutoProcessor.from_pretrained(model_id)

 Please follow the license of the original model.
+### Inference on CPU/XPU/CUDA
 Requirements
 ```bash
+pip install 'auto-round>=0.5'
 ```
 ~~~python
 from PIL import Image
 import requests
 import torch
+from auto_round import AutoRoundConfig ## must import for autoround format or use the tranformers>4.51.3
 model_id = "OPEA/gemma-3-12b-it-int4-AutoRound"
 model = Gemma3ForConditionalGeneration.from_pretrained(
+    model_id, torch_dtype=torch.bfloat16, device_map="auto").eval()
 processor = AutoProcessor.from_pretrained(model_id)