Upload GGUF models (F16 and Q4_K_M) for Llama 3 8B Sahabat-AI instruct. Current date: 2025-06-24

3af44e9 verified 5 months ago

1.6 kB

	---
	tags:
	- llama3
	- instruct
	- gguf
	- quantized
	- llama-cpp
	- Sahabat-AI
	pipeline_tag: text-generation
	---

	# Llama 3 8B Sahabat-AI Instruct (GGUF Versions)

	This repository contains GGUF converted and quantized versions of the [Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct](https://huggingface.co/Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct) model, converted using `llama.cpp`.

	This model is an instruction-tuned variant, suitable for chat and following commands.

	## Available GGUF Files:

	### 1. `llama3-8b-cpt-sahabatai-v1-instruct-f16.gguf`
	* Format: FP16 (Full Precision)
	* Size: ~16.1 GB
	* Description: This is the full-precision GGUF conversion. It offers the highest fidelity but requires significant VRAM (approx. 16 GB).

	### 2. `llama3-8b-cpt-sahabatai-v1-instruct-q4km.gguf`
	* Format: Q4_K_M (4-bit Quantized)
	* Size: ~4.58 GB (approximate, actual size may vary slightly)
	* Description: This is a highly optimized 4-bit quantized version, suitable for devices with limited VRAM (e.g., 8GB GPU VRAM). It offers a good balance between model size, performance, and minimal quality loss.

	## Original Model:
	* [Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct](https://huggingface.co/Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct)

	## How to Use:
	Download the desired `.gguf` file and use it with `llama.cpp`, LM Studio, Ollama, or any other GGUF-compatible inference tool.

	For `llama.cpp` CLI, you might use:
	```bash
	./main -m llama3-8b-cpt-sahabatai-v1-instruct-q4km.gguf -p "Write a story about a dragon." -n 128
	```