sanchezalonsodavid17
/

DeepSeek_Light_V1

4-bit precision

Model card Files Files and versions

DeepSeek_Light_V1 / README.md

sanchezalonsodavid17's picture

sanchezalonsodavid17

Update README.md

b116cfd verified 8 months ago

|

history blame contribute delete

2.48 kB

	---
	license: mit
	base_model:
	- deepseek-ai/deepseek-coder-6.7b-instruct
	---

	# DeepSeek-Light-V1: Optimized Version of DeepSeek-Coder-6.7B
	Based in the Basque Country 🇪🇸

	DeepSeek-Light-V1 is a highly optimized version of DeepSeek-Coder-6.7B, designed to reduce GPU memory consumption and improve deployment feasibility. This optimization combines 4-bit quantization and pruning, significantly lowering the number of parameters while maintaining functional capabilities.

	## Key Optimizations 🚀
	- 4-bit Quantization (BFloat16): Reduces VRAM usage with minimal precision loss.
	- Pruning: Removes redundant parameters to enhance efficiency.
	- Optimized for lightweight deployment: Works on lower-end hardware.

	## Model Comparison 📊

	\| Version \| Model Size \| GPU VRAM Usage \| Parameters \| Relative Performance \|
	\|---------\|-----------\|---------------\|-------------\|----------------\|
	\| Original (DeepSeek-Coder-6.7B) \| 3.51GB \| 7.85GB \| 6.7B \| 100% \|
	\| Optimized (DeepSeek-Light-V1) \| 3.51GB \| 3.93GB (50% reduction!) \| 3.5B \| ~50% performance \|

	## Why Use This Model? 💡
	✅ Runs on more affordable hardware – No need for high-end GPUs.
	✅ Reduces operational costs – More efficient deployment.
	✅ Enhances security – Enables local execution before moving to production.

	## How to Use 🛠️
	You can load the model using `transformers` with quantization:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	import torch

	# Load model and tokenizer
	model_name = "sanchezalonsodavid17/DeepSeek_Light_V1"
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

	quantization_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_use_double_quant=True,
	)

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	device_map="auto",
	quantization_config=quantization_config
	)

	# Generate text
	def generate_text(prompt, max_length=100):
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
	with torch.no_grad():
	output = model.generate(**inputs, max_length=max_length)
	return tokenizer.decode(output[0], skip_special_tokens=True)

	# Example usage
	prompt = "Explain how deep learning works in neural networks."
	response = generate_text(prompt)
	print(response)