Upload GGUF models (F16 and Q4_K_M) for Llama 3 8B Sahabat-AI instruct. Current date: 2025-06-24
3af44e9
verified
| tags: | |
| - llama3 | |
| - instruct | |
| - gguf | |
| - quantized | |
| - llama-cpp | |
| - Sahabat-AI | |
| pipeline_tag: text-generation | |
| # Llama 3 8B Sahabat-AI Instruct (GGUF Versions) | |
| This repository contains GGUF converted and quantized versions of the [Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct](https://huggingface.co/Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct) model, converted using `llama.cpp`. | |
| This model is an instruction-tuned variant, suitable for chat and following commands. | |
| ## Available GGUF Files: | |
| ### 1. `llama3-8b-cpt-sahabatai-v1-instruct-f16.gguf` | |
| * **Format:** FP16 (Full Precision) | |
| * **Size:** ~16.1 GB | |
| * **Description:** This is the full-precision GGUF conversion. It offers the highest fidelity but requires significant VRAM (approx. 16 GB). | |
| ### 2. `llama3-8b-cpt-sahabatai-v1-instruct-q4km.gguf` | |
| * **Format:** Q4_K_M (4-bit Quantized) | |
| * **Size:** ~4.58 GB (approximate, actual size may vary slightly) | |
| * **Description:** This is a highly optimized 4-bit quantized version, suitable for devices with limited VRAM (e.g., 8GB GPU VRAM). It offers a good balance between model size, performance, and minimal quality loss. | |
| ## Original Model: | |
| * [Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct](https://huggingface.co/Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct) | |
| ## How to Use: | |
| Download the desired `.gguf` file and use it with `llama.cpp`, LM Studio, Ollama, or any other GGUF-compatible inference tool. | |
| For `llama.cpp` CLI, you might use: | |
| ```bash | |
| ./main -m llama3-8b-cpt-sahabatai-v1-instruct-q4km.gguf -p "Write a story about a dragon." -n 128 | |
| ``` | |