DeepSeek_Light_V1 / README.md
sanchezalonsodavid17's picture
Update README.md
b116cfd verified
---
license: mit
base_model:
- deepseek-ai/deepseek-coder-6.7b-instruct
---
# **DeepSeek-Light-V1: Optimized Version of DeepSeek-Coder-6.7B**
**Based in the Basque Country 🇪🇸**
DeepSeek-Light-V1 is a **highly optimized version** of **DeepSeek-Coder-6.7B**, designed to reduce GPU memory consumption and improve deployment feasibility. This optimization combines **4-bit quantization** and **pruning**, significantly lowering the number of parameters while maintaining functional capabilities.
## **Key Optimizations 🚀**
- **4-bit Quantization (BFloat16):** Reduces VRAM usage with minimal precision loss.
- **Pruning:** Removes redundant parameters to enhance efficiency.
- **Optimized for lightweight deployment:** Works on lower-end hardware.
## **Model Comparison 📊**
| Version | Model Size | GPU VRAM Usage | Parameters | Relative Performance |
|---------|-----------|---------------|-------------|----------------|
| **Original (DeepSeek-Coder-6.7B)** | 3.51GB | 7.85GB | **6.7B** | **100%** |
| **Optimized (DeepSeek-Light-V1)** | 3.51GB | **3.93GB (50% reduction!)** | **3.5B** | **~50% performance** |
## **Why Use This Model? 💡**
**Runs on more affordable hardware** – No need for high-end GPUs.
**Reduces operational costs** – More efficient deployment.
**Enhances security** – Enables local execution before moving to production.
## **How to Use 🛠️**
You can load the model using `transformers` with quantization:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
# Load model and tokenizer
model_name = "sanchezalonsodavid17/DeepSeek_Light_V1"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
quantization_config=quantization_config
)
# Generate text
def generate_text(prompt, max_length=100):
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
output = model.generate(**inputs, max_length=max_length)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Example usage
prompt = "Explain how deep learning works in neural networks."
response = generate_text(prompt)
print(response)