SentientAGI
/

Dobby-Unhinged-Llama-3.3-70B_GGUF

Model card Files Files and versions

salzubi401 commited on Feb 12

Commit

142bdb4

·

verified ·

1 Parent(s): f16bafe

Create README.md

Files changed (1) hide show

README.md +51 -0

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+---
+language:
+- en
+license: llama3.3
+library_name: transformers
+tags:
+- Llama-3.3
+- Instruct
+- loyal AI
+- GGUF
+- finetune
+- chat
+- gpt4
+- synthetic data
+- roleplaying
+- unhinged
+- funny
+- opinionated
+- assistant
+- companion
+- friend
+base_model: meta-llama/Llama-3.3-70B-Instruct
+---
+# Dobby-Llama-3.3-70B_GGUF
+Dobby-70B high-performance GGUF model based on Llama 3.3 with 70 billion parameters. Designed for efficiency, this model supports quantization levels in **4-bit**, **6-bit**, and **8-bit**, offering flexibility to run on various hardware configurations without compromising performance.
+## Compatibility
+This model is compatible with:
+- **[LMStudio](https://lmstudio.ai/)**: An easy-to-use desktop application for running and fine-tuning large language models locally.
+- **[Ollama](https://ollama.com/)**: A versatile tool for deploying, managing, and interacting with large language models seamlessly.
+## Quantization Levels
+| **Quantization** | **Description**                                                                                                                                      | **Use Case**                                                                                              |
+|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
+| **4-bit**        | Highly compressed for minimal memory usage. Some loss in precision and quality, but great for lightweight devices with limited VRAM.                | Ideal for testing, quick prototyping, or running on low-end GPUs and CPUs.                               |
+| **6-bit**        | Strikes a balance between compression and quality. Offers improved accuracy over 4-bit without requiring significant additional resources.           | Recommended for users with mid-range hardware aiming for a compromise between speed and precision.       |
+| **8-bit**        | Full-precision quantization for maximum quality while still optimizing memory usage compared to full FP16 or FP32 models.                           | Perfect for high-performance systems where maintaining accuracy and precision is critical.               |
+## Recommended Usage
+Choose your quantization level based on the hardware you are using:
+- **4-bit** for ultra-lightweight systems.
+- **6-bit** for balance on mid-tier hardware.
+- **8-bit** for maximum performance on powerful GPUs.
+This model supports prompt fine-tuning for domain-specific tasks, making it an excellent choice for interactive applications like chatbots, question answering, and creative writing.