Fine-tuning Zamba2-7B Model: Step-by-Step Guide
#1
by
						
ssmits
	
							
						- opened
							
					
This guide will walk you through the process of fine-tuning the Zamba2-7B model. Make sure you have sufficient GPU memory as this is a 7B parameter model. I've tested the 1.2B model, which needed about ~20GB, so 2x H100 or 4x RTX4090 would probably suffice. Adjust the batch size and gradient accumulation steps based on your available VRAM.
Tested Environment:
- Vast.ai cloud instance (template link)
- CUDA 11.8
- Python 3.10
- PyTorch Image: pytorch/pytorch:2.3.1-cuda11.8-cudnn8-devel
- Selected for extra stability
1. Setup Environment
First, clone the repository and set up the environment:
git clone https://github.com/Zyphra/transformers_zamba2.git
cd transformers_zamba2
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # For Windows use: venv\Scripts\activate
# Install dependencies
pip install -e .
pip install accelerate datasets
2. Basic Inference Test
Let's first test if the model loads correctly:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-7B")
model = AutoModelForCausalLM.from_pretrained(
    "Zyphra/Zamba2-7B", 
    device_map="cuda", 
    torch_dtype=torch.bfloat16
)
# Test generation
input_text = "What factors contributed to the fall of the Roman Empire?"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
3. Fine-tuning Setup
Here's the complete fine-tuning script with detailed configurations:
CONTEXT_WINDOW = 1024
from transformers import (
    AutoTokenizer, AutoModelForCausalLM, TrainingArguments,
    Trainer, DataCollatorForLanguageModeling
)
import torch
from datasets import Dataset
# Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-7B")
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"  # Better for inference
# Initialize model
model = AutoModelForCausalLM.from_pretrained(
    "Zyphra/Zamba2-7B",
    torch_dtype=torch.bfloat16,
    device_map="auto"  # Handles multi-GPU/CPU mapping
)
model.config.pad_token_id = tokenizer.pad_token_id
# Tokenization function
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        padding=True,
        truncation=True,
        max_length=1024,
        return_tensors=None
    )
# Prepare training data
train_texts = [
    "What factors contributed to the fall of the Roman Empire?",
    # Add your training examples here
]
dataset = Dataset.from_dict({"text": train_texts})
tokenized_dataset = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=dataset.column_names
)
# Training configuration
training_args = TrainingArguments(
    output_dir="./zamba2-finetuned",
    num_train_epochs=3,
    per_device_train_batch_size=1,
    save_steps=500,
    save_total_limit=2,
    logging_steps=100,
    learning_rate=2e-5,
    weight_decay=0.01,
    fp16=False,
    bf16=True,
    gradient_accumulation_steps=16
)
# Data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)
# Custom trainer wrapper for device mapping
class CustomTrainer(Trainer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.model = model
        
    def _move_model_to_device(self, model, device):
        pass  # Model already mapped to devices
# Initialize trainer
trainer = CustomTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator
)
# Train and save
trainer.train()
model.save_pretrained("./zamba2-finetuned-final")
tokenizer.save_pretrained("./zamba2-finetuned-final")
Important Notes:
- Hardware Requirements: - Recommended: GPU with at least 24GB VRAM for CONTEXT_WINDOW = 1024 (>24GB for 2048, did not test multi-GPU yet).
- The script uses bfloat16 precision to reduce memory usage
 
- Training Configuration: - Context window: 1024 tokens
- Learning rate: 2e-5
- Weight decay: 0.01
- Gradient accumulation: 16 steps
- Training epochs: 3
 
- Customization: - Add your training examples to the train_textslist
- Adjust training_argsparameters based on your needs
- Modify max_lengthin tokenization if needed
 
- Add your training examples to the 
- Output: - The fine-tuned model will be saved in ./zamba2-finetuned-final
- Checkpoints during training are saved in ./zamba2-finetuned
 
- The fine-tuned model will be saved in 
Feel free to ask if you have any questions about the process.
