LLaVA-1.5 7B - Health & Safety Image Captioning

Model Description

This model is a fine-tuned version of llava-hf/llava-1.5-7b-hf specifically trained for comprehensive health and safety image captioning in construction and workplace environments. The model analyzes workplace images to identify hazards, assess safety conditions, and generate detailed safety recommendations.

Model Performance

Training Metrics

F1 Score: 0.4997 (safety keyword-based)
Precision: 0.3961
Recall: 0.6768

Evaluation Metrics

BLEU Score: 0.0191
ROUGE-L: 0.1410
METEOR: 0.4314
CIDEr: 0.1298
SPICE: 0.1298

Training Details

Base Model: llava-hf/llava-1.5-7b-hf
Training Dataset: 620 expert-annotated health and safety images
Training Samples: 496
Validation Samples: 124
Training Epochs: 10
Training Method: LoRA (Low-Rank Adaptation) with 4-bit quantization
Batch Size: 2
Learning Rate: 2e-4
Hardware: NVIDIA A100 80GB

Dataset

The training dataset consists of 620 carefully annotated workplace and construction images covering:

Construction Sites: Various stages of construction work
Safety Incidents: Examples of unsafe conditions and practices
Equipment and Machinery: Construction vehicles, tools, and safety equipment
Environmental Conditions: Different weather conditions affecting safety
Hazard Categories: Slip/trip hazards, fall risks, equipment hazards, PPE violations

Each image includes expert-written captions identifying:

Specific health and safety hazards present
Construction objects, equipment, and activities
Work stage and scene context
Weather and environmental conditions
Safety incident categorization and recommendations

Intended Use

Primary Use Cases

Automated Safety Inspections: Generate detailed safety assessments from workplace photos
Safety Training: Create educational content with detailed hazard identification
Compliance Monitoring: Assist in identifying potential safety violations
Risk Assessment: Support safety professionals in comprehensive site evaluations

Limitations

Designed specifically for construction and workplace safety scenarios
Generated captions should be reviewed by qualified safety professionals
Not intended as a replacement for human safety inspections
Performance may vary with image quality and viewing angles

Usage

from transformers import LlavaForConditionalGeneration, LlavaProcessor
from PIL import Image
import torch

# Load model and processor
model = LlavaForConditionalGeneration.from_pretrained("sameenarshad786/llava-1.5-health-safety-captioning")
processor = LlavaProcessor.from_pretrained("sameenarshad786/llava-1.5-health-safety-captioning")

# Load your workplace safety image
image = Image.open("construction_site.jpg")

# Prepare the safety analysis prompt
prompt = """Analyze this workplace/construction image and provide a detailed caption describing:

1. Health and safety hazards or incidents present
2. Construction objects, equipment, and work activities visible  
3. Scene description and work stage context
4. Weather and environmental conditions
5. Safety category and incident type if applicable

Focus on safety-critical elements and provide a comprehensive caption for workplace safety training."""

# Generate safety caption
inputs = processor(text=prompt, images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=200, do_sample=False)

caption = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(caption)

Files Included

This repository contains:

Model Weights: Fine-tuned LLaVA-1.5 model (final_llava_health_safety_model/)
Training Results: Comprehensive evaluation metrics and predictions (results/)
Training Logs: Complete training history and performance tracking (logs/)
Dataset: Complete dataset with expert annotations (data/)
Notebooks: Training and evaluation notebooks (notebooks/)

Training Process

Data Preparation: 620 workplace images with expert safety annotations
Model Setup: LLaVA-1.5 7B with 4-bit quantization for efficiency
Fine-tuning: LoRA adaptation targeting vision-language layers
Evaluation: Comprehensive metrics including BLEU, ROUGE-L, METEOR, CIDEr, and SPICE
Validation: 80/20 train/validation split with full dataset evaluation every 5 epochs

Ethical Considerations

Safety Enhancement: Designed to improve workplace safety
Professional Review: AI-generated assessments should be validated by qualified professionals
Accuracy Limitations: AI analysis supplements but does not replace human safety expertise

Citation

@misc{llava-health-safety-2025,
  title={LLaVA-1.5 7B - Health & Safety Image Captioning},
  author={Sameen Arshad},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/sameenarshad786/llava-1.5-health-safety-captioning},
  note={Fine-tuned on 620 expert-annotated workplace safety images}
}

Contact

Author: Sameen Arshad
Institution: University of the West of England (UWE Bristol)
Email: [email protected]

License

This model is released under the Apache 2.0 License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for sameenarshad786/llava-1.5-health-safety-captioning

Base model

llava-hf/llava-1.5-7b-hf

Finetuned

(88)

this model