Overview
This document presents the evaluation results of DeepSeek-R1-Distill-Llama-70B, a 4-bit quantized model using GPTQ, evaluated with the Language Model Evaluation Harness on the ARC-Challenge benchmark.
⚙️ Model Configuration
- Model:
Llama-3.3-70B-Instruct - Parameters:
70 billion - Quantization:
4-bit GPTQ - Source: Hugging Face (
hf) - Precision:
torch.float16 - Hardware:
NVIDIA A100 80GB PCIe - CUDA Version:
12.4 - PyTorch Version:
2.6.0+cu124 - Batch Size:
1
📌 Interpretation:
- The evaluation was performed on a high-performance GPU (A100 80GB).
- The model is significantly larger than the previous 8B version, with GPTQ 4-bit quantization reducing memory footprint.
- A single-sample batch size was used, which might slow evaluation speed.
📌 Let us know if you need further analysis or model tuning! 🚀
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for empirischtech/Llama-3.3-70B-gptq-4bit
Base model
meta-llama/Llama-3.1-70B
Finetuned
meta-llama/Llama-3.3-70B-Instruct