empirischtech
/

Llama-3.3-70B-gptq-4bit

4-bit precision

Model card Files Files and versions

Overview

This document presents the evaluation results of DeepSeek-R1-Distill-Llama-70B, a 4-bit quantized model using GPTQ, evaluated with the Language Model Evaluation Harness on the ARC-Challenge benchmark.

⚙️ Model Configuration

Model: Llama-3.3-70B-Instruct
Parameters: 70 billion
Quantization: 4-bit GPTQ
Source: Hugging Face (hf)
Precision: torch.float16
Hardware: NVIDIA A100 80GB PCIe
CUDA Version: 12.4
PyTorch Version: 2.6.0+cu124
Batch Size: 1

📌 Interpretation:

The evaluation was performed on a high-performance GPU (A100 80GB).
The model is significantly larger than the previous 8B version, with GPTQ 4-bit quantization reducing memory footprint.
A single-sample batch size was used, which might slow evaluation speed.

📌 Let us know if you need further analysis or model tuning! 🚀

Downloads last month: -

Safetensors

Model size

11B params

Tensor type

I32

·

BF16

·

F16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for empirischtech/Llama-3.3-70B-gptq-4bit

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.3-70B-Instruct

Quantized

(134)

this model

Dataset used to train empirischtech/Llama-3.3-70B-gptq-4bit