|
|
--- |
|
|
license: mit |
|
|
base_model: |
|
|
- deepseek-ai/deepseek-coder-6.7b-instruct |
|
|
--- |
|
|
|
|
|
# **DeepSeek-Light-V1: Optimized Version of DeepSeek-Coder-6.7B** |
|
|
**Based in the Basque Country 🇪🇸** |
|
|
|
|
|
DeepSeek-Light-V1 is a **highly optimized version** of **DeepSeek-Coder-6.7B**, designed to reduce GPU memory consumption and improve deployment feasibility. This optimization combines **4-bit quantization** and **pruning**, significantly lowering the number of parameters while maintaining functional capabilities. |
|
|
|
|
|
## **Key Optimizations 🚀** |
|
|
- **4-bit Quantization (BFloat16):** Reduces VRAM usage with minimal precision loss. |
|
|
- **Pruning:** Removes redundant parameters to enhance efficiency. |
|
|
- **Optimized for lightweight deployment:** Works on lower-end hardware. |
|
|
|
|
|
## **Model Comparison 📊** |
|
|
|
|
|
| Version | Model Size | GPU VRAM Usage | Parameters | Relative Performance | |
|
|
|---------|-----------|---------------|-------------|----------------| |
|
|
| **Original (DeepSeek-Coder-6.7B)** | 3.51GB | 7.85GB | **6.7B** | **100%** | |
|
|
| **Optimized (DeepSeek-Light-V1)** | 3.51GB | **3.93GB (50% reduction!)** | **3.5B** | **~50% performance** | |
|
|
|
|
|
## **Why Use This Model? 💡** |
|
|
✅ **Runs on more affordable hardware** – No need for high-end GPUs. |
|
|
✅ **Reduces operational costs** – More efficient deployment. |
|
|
✅ **Enhances security** – Enables local execution before moving to production. |
|
|
|
|
|
## **How to Use 🛠️** |
|
|
You can load the model using `transformers` with quantization: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "sanchezalonsodavid17/DeepSeek_Light_V1" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
|
|
|
|
|
quantization_config = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_quant_type="nf4", |
|
|
bnb_4bit_compute_dtype=torch.bfloat16, |
|
|
bnb_4bit_use_double_quant=True, |
|
|
) |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
device_map="auto", |
|
|
quantization_config=quantization_config |
|
|
) |
|
|
|
|
|
# Generate text |
|
|
def generate_text(prompt, max_length=100): |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
|
with torch.no_grad(): |
|
|
output = model.generate(**inputs, max_length=max_length) |
|
|
return tokenizer.decode(output[0], skip_special_tokens=True) |
|
|
|
|
|
# Example usage |
|
|
prompt = "Explain how deep learning works in neural networks." |
|
|
response = generate_text(prompt) |
|
|
print(response) |