File size: 2,478 Bytes

b116cfd

---
license: mit
base_model:
- deepseek-ai/deepseek-coder-6.7b-instruct
---

# **DeepSeek-Light-V1: Optimized Version of DeepSeek-Coder-6.7B**  
**Based in the Basque Country 🇪🇸**  

DeepSeek-Light-V1 is a **highly optimized version** of **DeepSeek-Coder-6.7B**, designed to reduce GPU memory consumption and improve deployment feasibility. This optimization combines **4-bit quantization** and **pruning**, significantly lowering the number of parameters while maintaining functional capabilities.  

## **Key Optimizations 🚀**  
- **4-bit Quantization (BFloat16):** Reduces VRAM usage with minimal precision loss.  
- **Pruning:** Removes redundant parameters to enhance efficiency.  
- **Optimized for lightweight deployment:** Works on lower-end hardware.  

## **Model Comparison 📊**  

| Version | Model Size | GPU VRAM Usage | Parameters | Relative Performance |
|---------|-----------|---------------|-------------|----------------|
| **Original (DeepSeek-Coder-6.7B)** | 3.51GB | 7.85GB | **6.7B** | **100%** |
| **Optimized (DeepSeek-Light-V1)** | 3.51GB | **3.93GB (50% reduction!)** | **3.5B** | **~50% performance** |

## **Why Use This Model? 💡**  
✅ **Runs on more affordable hardware** – No need for high-end GPUs.  
✅ **Reduces operational costs** – More efficient deployment.  
✅ **Enhances security** – Enables local execution before moving to production.  

## **How to Use 🛠️**  
You can load the model using `transformers` with quantization:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# Load model and tokenizer
model_name = "sanchezalonsodavid17/DeepSeek_Light_V1"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    quantization_config=quantization_config
)

# Generate text
def generate_text(prompt, max_length=100):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    with torch.no_grad():
        output = model.generate(**inputs, max_length=max_length)
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Example usage
prompt = "Explain how deep learning works in neural networks."
response = generate_text(prompt)
print(response)