--- license: mit base_model: - deepseek-ai/deepseek-coder-6.7b-instruct --- # **DeepSeek-Light-V1: Optimized Version of DeepSeek-Coder-6.7B** **Based in the Basque Country 🇪🇸** DeepSeek-Light-V1 is a **highly optimized version** of **DeepSeek-Coder-6.7B**, designed to reduce GPU memory consumption and improve deployment feasibility. This optimization combines **4-bit quantization** and **pruning**, significantly lowering the number of parameters while maintaining functional capabilities. ## **Key Optimizations 🚀** - **4-bit Quantization (BFloat16):** Reduces VRAM usage with minimal precision loss. - **Pruning:** Removes redundant parameters to enhance efficiency. - **Optimized for lightweight deployment:** Works on lower-end hardware. ## **Model Comparison 📊** | Version | Model Size | GPU VRAM Usage | Parameters | Relative Performance | |---------|-----------|---------------|-------------|----------------| | **Original (DeepSeek-Coder-6.7B)** | 3.51GB | 7.85GB | **6.7B** | **100%** | | **Optimized (DeepSeek-Light-V1)** | 3.51GB | **3.93GB (50% reduction!)** | **3.5B** | **~50% performance** | ## **Why Use This Model? 💡** ✅ **Runs on more affordable hardware** – No need for high-end GPUs. ✅ **Reduces operational costs** – More efficient deployment. ✅ **Enhances security** – Enables local execution before moving to production. ## **How to Use 🛠️** You can load the model using `transformers` with quantization: ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig import torch # Load model and tokenizer model_name = "sanchezalonsodavid17/DeepSeek_Light_V1" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", quantization_config=quantization_config ) # Generate text def generate_text(prompt, max_length=100): inputs = tokenizer(prompt, return_tensors="pt").to("cuda") with torch.no_grad(): output = model.generate(**inputs, max_length=max_length) return tokenizer.decode(output[0], skip_special_tokens=True) # Example usage prompt = "Explain how deep learning works in neural networks." response = generate_text(prompt) print(response)