Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: llama3.3
|
| 5 |
+
library_name: transformers
|
| 6 |
+
tags:
|
| 7 |
+
- Llama-3.3
|
| 8 |
+
- Instruct
|
| 9 |
+
- loyal AI
|
| 10 |
+
- GGUF
|
| 11 |
+
- finetune
|
| 12 |
+
- chat
|
| 13 |
+
- gpt4
|
| 14 |
+
- synthetic data
|
| 15 |
+
- roleplaying
|
| 16 |
+
- unhinged
|
| 17 |
+
- funny
|
| 18 |
+
- opinionated
|
| 19 |
+
- assistant
|
| 20 |
+
- companion
|
| 21 |
+
- friend
|
| 22 |
+
base_model: meta-llama/Llama-3.3-70B-Instruct
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
# Dobby-Llama-3.3-70B_GGUF
|
| 26 |
+
|
| 27 |
+
Dobby-70B high-performance GGUF model based on Llama 3.3 with 70 billion parameters. Designed for efficiency, this model supports quantization levels in **4-bit**, **6-bit**, and **8-bit**, offering flexibility to run on various hardware configurations without compromising performance.
|
| 28 |
+
|
| 29 |
+
## Compatibility
|
| 30 |
+
|
| 31 |
+
This model is compatible with:
|
| 32 |
+
|
| 33 |
+
- **[LMStudio](https://lmstudio.ai/)**: An easy-to-use desktop application for running and fine-tuning large language models locally.
|
| 34 |
+
- **[Ollama](https://ollama.com/)**: A versatile tool for deploying, managing, and interacting with large language models seamlessly.
|
| 35 |
+
|
| 36 |
+
## Quantization Levels
|
| 37 |
+
|
| 38 |
+
| **Quantization** | **Description** | **Use Case** |
|
| 39 |
+
|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
|
| 40 |
+
| **4-bit** | Highly compressed for minimal memory usage. Some loss in precision and quality, but great for lightweight devices with limited VRAM. | Ideal for testing, quick prototyping, or running on low-end GPUs and CPUs. |
|
| 41 |
+
| **6-bit** | Strikes a balance between compression and quality. Offers improved accuracy over 4-bit without requiring significant additional resources. | Recommended for users with mid-range hardware aiming for a compromise between speed and precision. |
|
| 42 |
+
| **8-bit** | Full-precision quantization for maximum quality while still optimizing memory usage compared to full FP16 or FP32 models. | Perfect for high-performance systems where maintaining accuracy and precision is critical. |
|
| 43 |
+
|
| 44 |
+
## Recommended Usage
|
| 45 |
+
|
| 46 |
+
Choose your quantization level based on the hardware you are using:
|
| 47 |
+
- **4-bit** for ultra-lightweight systems.
|
| 48 |
+
- **6-bit** for balance on mid-tier hardware.
|
| 49 |
+
- **8-bit** for maximum performance on powerful GPUs.
|
| 50 |
+
|
| 51 |
+
This model supports prompt fine-tuning for domain-specific tasks, making it an excellent choice for interactive applications like chatbots, question answering, and creative writing.
|