sanchezalonsodavid17 commited on
Commit
b116cfd
·
verified ·
1 Parent(s): c1aac61

Update README.md

Browse files

Added Model Card with optimizations & benchmarks.

Files changed (1) hide show
  1. README.md +63 -3
README.md CHANGED
@@ -1,3 +1,63 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - deepseek-ai/deepseek-coder-6.7b-instruct
5
+ ---
6
+
7
+ # **DeepSeek-Light-V1: Optimized Version of DeepSeek-Coder-6.7B**
8
+ **Based in the Basque Country 🇪🇸**
9
+
10
+ DeepSeek-Light-V1 is a **highly optimized version** of **DeepSeek-Coder-6.7B**, designed to reduce GPU memory consumption and improve deployment feasibility. This optimization combines **4-bit quantization** and **pruning**, significantly lowering the number of parameters while maintaining functional capabilities.
11
+
12
+ ## **Key Optimizations 🚀**
13
+ - **4-bit Quantization (BFloat16):** Reduces VRAM usage with minimal precision loss.
14
+ - **Pruning:** Removes redundant parameters to enhance efficiency.
15
+ - **Optimized for lightweight deployment:** Works on lower-end hardware.
16
+
17
+ ## **Model Comparison 📊**
18
+
19
+ | Version | Model Size | GPU VRAM Usage | Parameters | Relative Performance |
20
+ |---------|-----------|---------------|-------------|----------------|
21
+ | **Original (DeepSeek-Coder-6.7B)** | 3.51GB | 7.85GB | **6.7B** | **100%** |
22
+ | **Optimized (DeepSeek-Light-V1)** | 3.51GB | **3.93GB (50% reduction!)** | **3.5B** | **~50% performance** |
23
+
24
+ ## **Why Use This Model? 💡**
25
+ ✅ **Runs on more affordable hardware** – No need for high-end GPUs.
26
+ ✅ **Reduces operational costs** – More efficient deployment.
27
+ ✅ **Enhances security** – Enables local execution before moving to production.
28
+
29
+ ## **How to Use 🛠️**
30
+ You can load the model using `transformers` with quantization:
31
+
32
+ ```python
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
34
+ import torch
35
+
36
+ # Load model and tokenizer
37
+ model_name = "sanchezalonsodavid17/DeepSeek_Light_V1"
38
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
39
+
40
+ quantization_config = BitsAndBytesConfig(
41
+ load_in_4bit=True,
42
+ bnb_4bit_quant_type="nf4",
43
+ bnb_4bit_compute_dtype=torch.bfloat16,
44
+ bnb_4bit_use_double_quant=True,
45
+ )
46
+
47
+ model = AutoModelForCausalLM.from_pretrained(
48
+ model_name,
49
+ device_map="auto",
50
+ quantization_config=quantization_config
51
+ )
52
+
53
+ # Generate text
54
+ def generate_text(prompt, max_length=100):
55
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
56
+ with torch.no_grad():
57
+ output = model.generate(**inputs, max_length=max_length)
58
+ return tokenizer.decode(output[0], skip_special_tokens=True)
59
+
60
+ # Example usage
61
+ prompt = "Explain how deep learning works in neural networks."
62
+ response = generate_text(prompt)
63
+ print(response)