File size: 1,599 Bytes
3af44e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---

tags:
- llama3
- instruct
- gguf
- quantized
- llama-cpp
- Sahabat-AI
pipeline_tag: text-generation
---


# Llama 3 8B Sahabat-AI Instruct (GGUF Versions)

This repository contains GGUF converted and quantized versions of the [Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct](https://huggingface.co/Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct) model, converted using `llama.cpp`.

This model is an instruction-tuned variant, suitable for chat and following commands.

## Available GGUF Files:

### 1. `llama3-8b-cpt-sahabatai-v1-instruct-f16.gguf`
* **Format:** FP16 (Full Precision)
* **Size:** ~16.1 GB
* **Description:** This is the full-precision GGUF conversion. It offers the highest fidelity but requires significant VRAM (approx. 16 GB).

### 2. `llama3-8b-cpt-sahabatai-v1-instruct-q4km.gguf`
* **Format:** Q4_K_M (4-bit Quantized)
* **Size:** ~4.58 GB (approximate, actual size may vary slightly)
* **Description:** This is a highly optimized 4-bit quantized version, suitable for devices with limited VRAM (e.g., 8GB GPU VRAM). It offers a good balance between model size, performance, and minimal quality loss.

## Original Model:
* [Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct](https://huggingface.co/Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct)

## How to Use:
Download the desired `.gguf` file and use it with `llama.cpp`, LM Studio, Ollama, or any other GGUF-compatible inference tool.

For `llama.cpp` CLI, you might use:
```bash

./main -m llama3-8b-cpt-sahabatai-v1-instruct-q4km.gguf -p "Write a story about a dragon." -n 128

```