sanchezalonsodavid17
/

DeepSeek_Light_V1

4-bit precision

Model card Files Files and versions

sanchezalonsodavid17 commited on Mar 12

Commit

b116cfd

·

verified ·

1 Parent(s): c1aac61

Update README.md

Added Model Card with optimizations & benchmarks.

Files changed (1) hide show

README.md +63 -3

README.md CHANGED Viewed

@@ -1,3 +1,63 @@
----
-license: mit
----

+---
+license: mit
+base_model:
+- deepseek-ai/deepseek-coder-6.7b-instruct
+---
+# **DeepSeek-Light-V1: Optimized Version of DeepSeek-Coder-6.7B**
+**Based in the Basque Country 🇪🇸**
+DeepSeek-Light-V1 is a **highly optimized version** of **DeepSeek-Coder-6.7B**, designed to reduce GPU memory consumption and improve deployment feasibility. This optimization combines **4-bit quantization** and **pruning**, significantly lowering the number of parameters while maintaining functional capabilities.
+## **Key Optimizations 🚀**
+- **4-bit Quantization (BFloat16):** Reduces VRAM usage with minimal precision loss.
+- **Pruning:** Removes redundant parameters to enhance efficiency.
+- **Optimized for lightweight deployment:** Works on lower-end hardware.
+## **Model Comparison 📊**
+| Version | Model Size | GPU VRAM Usage | Parameters | Relative Performance |
+|---------|-----------|---------------|-------------|----------------|
+| **Original (DeepSeek-Coder-6.7B)** | 3.51GB | 7.85GB | **6.7B** | **100%** |
+| **Optimized (DeepSeek-Light-V1)** | 3.51GB | **3.93GB (50% reduction!)** | **3.5B** | **~50% performance** |
+## **Why Use This Model? 💡**
+✅ **Runs on more affordable hardware** – No need for high-end GPUs.
+✅ **Reduces operational costs** – More efficient deployment.
+✅ **Enhances security** – Enables local execution before moving to production.
+## **How to Use 🛠️**
+You can load the model using `transformers` with quantization:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+import torch
+# Load model and tokenizer
+model_name = "sanchezalonsodavid17/DeepSeek_Light_V1"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16,
+    bnb_4bit_use_double_quant=True,
+)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="auto",
+    quantization_config=quantization_config
+)
+# Generate text
+def generate_text(prompt, max_length=100):
+    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+    with torch.no_grad():
+        output = model.generate(**inputs, max_length=max_length)
+    return tokenizer.decode(output[0], skip_special_tokens=True)
+# Example usage
+prompt = "Explain how deep learning works in neural networks."
+response = generate_text(prompt)
+print(response)