LLaVA-1.5 7B - Health & Safety Image Captioning

Model Description

This model is a fine-tuned version of llava-hf/llava-1.5-7b-hf specifically trained for comprehensive health and safety image captioning in construction and workplace environments. The model analyzes workplace images to identify hazards, assess safety conditions, and generate detailed safety recommendations.

Model Performance

Training Metrics

  • F1 Score: 0.4997 (safety keyword-based)
  • Precision: 0.3961
  • Recall: 0.6768

Evaluation Metrics

  • BLEU Score: 0.0191
  • ROUGE-L: 0.1410
  • METEOR: 0.4314
  • CIDEr: 0.1298
  • SPICE: 0.1298

Training Details

  • Base Model: llava-hf/llava-1.5-7b-hf
  • Training Dataset: 620 expert-annotated health and safety images
  • Training Samples: 496
  • Validation Samples: 124
  • Training Epochs: 10
  • Training Method: LoRA (Low-Rank Adaptation) with 4-bit quantization
  • Batch Size: 2
  • Learning Rate: 2e-4
  • Hardware: NVIDIA A100 80GB

Dataset

The training dataset consists of 620 carefully annotated workplace and construction images covering:

  • Construction Sites: Various stages of construction work
  • Safety Incidents: Examples of unsafe conditions and practices
  • Equipment and Machinery: Construction vehicles, tools, and safety equipment
  • Environmental Conditions: Different weather conditions affecting safety
  • Hazard Categories: Slip/trip hazards, fall risks, equipment hazards, PPE violations

Each image includes expert-written captions identifying:

  1. Specific health and safety hazards present
  2. Construction objects, equipment, and activities
  3. Work stage and scene context
  4. Weather and environmental conditions
  5. Safety incident categorization and recommendations

Intended Use

Primary Use Cases

  • Automated Safety Inspections: Generate detailed safety assessments from workplace photos
  • Safety Training: Create educational content with detailed hazard identification
  • Compliance Monitoring: Assist in identifying potential safety violations
  • Risk Assessment: Support safety professionals in comprehensive site evaluations

Limitations

  • Designed specifically for construction and workplace safety scenarios
  • Generated captions should be reviewed by qualified safety professionals
  • Not intended as a replacement for human safety inspections
  • Performance may vary with image quality and viewing angles

Usage

from transformers import LlavaForConditionalGeneration, LlavaProcessor
from PIL import Image
import torch

# Load model and processor
model = LlavaForConditionalGeneration.from_pretrained("sameenarshad786/llava-1.5-health-safety-captioning")
processor = LlavaProcessor.from_pretrained("sameenarshad786/llava-1.5-health-safety-captioning")

# Load your workplace safety image
image = Image.open("construction_site.jpg")

# Prepare the safety analysis prompt
prompt = """Analyze this workplace/construction image and provide a detailed caption describing:

1. Health and safety hazards or incidents present
2. Construction objects, equipment, and work activities visible  
3. Scene description and work stage context
4. Weather and environmental conditions
5. Safety category and incident type if applicable

Focus on safety-critical elements and provide a comprehensive caption for workplace safety training."""

# Generate safety caption
inputs = processor(text=prompt, images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=200, do_sample=False)

caption = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(caption)

Files Included

This repository contains:

  • Model Weights: Fine-tuned LLaVA-1.5 model (final_llava_health_safety_model/)
  • Training Results: Comprehensive evaluation metrics and predictions (results/)
  • Training Logs: Complete training history and performance tracking (logs/)
  • Dataset: Complete dataset with expert annotations (data/)
  • Notebooks: Training and evaluation notebooks (notebooks/)

Training Process

  1. Data Preparation: 620 workplace images with expert safety annotations
  2. Model Setup: LLaVA-1.5 7B with 4-bit quantization for efficiency
  3. Fine-tuning: LoRA adaptation targeting vision-language layers
  4. Evaluation: Comprehensive metrics including BLEU, ROUGE-L, METEOR, CIDEr, and SPICE
  5. Validation: 80/20 train/validation split with full dataset evaluation every 5 epochs

Ethical Considerations

  • Safety Enhancement: Designed to improve workplace safety
  • Professional Review: AI-generated assessments should be validated by qualified professionals
  • Accuracy Limitations: AI analysis supplements but does not replace human safety expertise

Citation

@misc{llava-health-safety-2025,
  title={LLaVA-1.5 7B - Health & Safety Image Captioning},
  author={Sameen Arshad},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/sameenarshad786/llava-1.5-health-safety-captioning},
  note={Fine-tuned on 620 expert-annotated workplace safety images}
}

Contact

  • Author: Sameen Arshad
  • Institution: University of the West of England (UWE Bristol)
  • Email: [email protected]

License

This model is released under the Apache 2.0 License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sameenarshad786/llava-1.5-health-safety-captioning

Finetuned
(88)
this model