LLaVA-1.5 7B - Health & Safety Image Captioning
Model Description
This model is a fine-tuned version of llava-hf/llava-1.5-7b-hf specifically trained for comprehensive health and safety image captioning in construction and workplace environments. The model analyzes workplace images to identify hazards, assess safety conditions, and generate detailed safety recommendations.
Model Performance
Training Metrics
- F1 Score: 0.4997 (safety keyword-based)
- Precision: 0.3961
- Recall: 0.6768
Evaluation Metrics
- BLEU Score: 0.0191
- ROUGE-L: 0.1410
- METEOR: 0.4314
- CIDEr: 0.1298
- SPICE: 0.1298
Training Details
- Base Model: llava-hf/llava-1.5-7b-hf
- Training Dataset: 620 expert-annotated health and safety images
- Training Samples: 496
- Validation Samples: 124
- Training Epochs: 10
- Training Method: LoRA (Low-Rank Adaptation) with 4-bit quantization
- Batch Size: 2
- Learning Rate: 2e-4
- Hardware: NVIDIA A100 80GB
Dataset
The training dataset consists of 620 carefully annotated workplace and construction images covering:
- Construction Sites: Various stages of construction work
- Safety Incidents: Examples of unsafe conditions and practices
- Equipment and Machinery: Construction vehicles, tools, and safety equipment
- Environmental Conditions: Different weather conditions affecting safety
- Hazard Categories: Slip/trip hazards, fall risks, equipment hazards, PPE violations
Each image includes expert-written captions identifying:
- Specific health and safety hazards present
- Construction objects, equipment, and activities
- Work stage and scene context
- Weather and environmental conditions
- Safety incident categorization and recommendations
Intended Use
Primary Use Cases
- Automated Safety Inspections: Generate detailed safety assessments from workplace photos
- Safety Training: Create educational content with detailed hazard identification
- Compliance Monitoring: Assist in identifying potential safety violations
- Risk Assessment: Support safety professionals in comprehensive site evaluations
Limitations
- Designed specifically for construction and workplace safety scenarios
- Generated captions should be reviewed by qualified safety professionals
- Not intended as a replacement for human safety inspections
- Performance may vary with image quality and viewing angles
Usage
from transformers import LlavaForConditionalGeneration, LlavaProcessor
from PIL import Image
import torch
# Load model and processor
model = LlavaForConditionalGeneration.from_pretrained("sameenarshad786/llava-1.5-health-safety-captioning")
processor = LlavaProcessor.from_pretrained("sameenarshad786/llava-1.5-health-safety-captioning")
# Load your workplace safety image
image = Image.open("construction_site.jpg")
# Prepare the safety analysis prompt
prompt = """Analyze this workplace/construction image and provide a detailed caption describing:
1. Health and safety hazards or incidents present
2. Construction objects, equipment, and work activities visible
3. Scene description and work stage context
4. Weather and environmental conditions
5. Safety category and incident type if applicable
Focus on safety-critical elements and provide a comprehensive caption for workplace safety training."""
# Generate safety caption
inputs = processor(text=prompt, images=image, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=False)
caption = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(caption)
Files Included
This repository contains:
- Model Weights: Fine-tuned LLaVA-1.5 model (
final_llava_health_safety_model/) - Training Results: Comprehensive evaluation metrics and predictions (
results/) - Training Logs: Complete training history and performance tracking (
logs/) - Dataset: Complete dataset with expert annotations (
data/) - Notebooks: Training and evaluation notebooks (
notebooks/)
Training Process
- Data Preparation: 620 workplace images with expert safety annotations
- Model Setup: LLaVA-1.5 7B with 4-bit quantization for efficiency
- Fine-tuning: LoRA adaptation targeting vision-language layers
- Evaluation: Comprehensive metrics including BLEU, ROUGE-L, METEOR, CIDEr, and SPICE
- Validation: 80/20 train/validation split with full dataset evaluation every 5 epochs
Ethical Considerations
- Safety Enhancement: Designed to improve workplace safety
- Professional Review: AI-generated assessments should be validated by qualified professionals
- Accuracy Limitations: AI analysis supplements but does not replace human safety expertise
Citation
@misc{llava-health-safety-2025,
title={LLaVA-1.5 7B - Health & Safety Image Captioning},
author={Sameen Arshad},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/sameenarshad786/llava-1.5-health-safety-captioning},
note={Fine-tuned on 620 expert-annotated workplace safety images}
}
Contact
- Author: Sameen Arshad
- Institution: University of the West of England (UWE Bristol)
- Email: [email protected]
License
This model is released under the Apache 2.0 License.
Model tree for sameenarshad786/llava-1.5-health-safety-captioning
Base model
llava-hf/llava-1.5-7b-hf