fisheye8k_SenseTime_deformable-detr

This model is a fine-tuned version of SenseTime/deformable-detr on the Fisheye8K dataset. It was developed as part of the Mcity Data Engine project, described in the paper Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection.

The code for the Mcity Data Engine project is available on GitHub.

It achieves the following results on the evaluation set:

  • Loss: 1.2335

Model description

This model is a fine-tuned object detection model based on the SenseTime/deformable-detr architecture, specifically trained for object detection on fisheye camera imagery. It is a product of the Mcity Data Engine, an open-source system designed for iterative data selection and model improvement in Intelligent Transportation Systems (ITS). The model can detect objects such as "Bus", "Bike", "Car", "Pedestrian", and "Truck", leveraging an open-vocabulary data selection process during its development to focus on rare and novel classes.

Intended uses & limitations

This model is intended for object detection tasks within Intelligent Transportation Systems (ITS) that utilize fisheye camera data. Potential applications include traffic monitoring, enhancing autonomous driving perception, and smart city infrastructure, with a focus on detecting long-tail classes of interest and vulnerable road users (VRU).

Limitations:

  • The model's performance is optimized for fisheye camera data and the specific object classes it was trained on.
  • Performance may vary significantly in out-of-distribution scenarios or when applied to data from different camera types or environments.
  • Users should consider potential biases inherited from the underlying Fisheye8K dataset.

Sample Usage

You can use this model directly with the transformers pipeline for object detection:

from transformers import pipeline
from PIL import Image
import requests
from io import BytesIO

# Load the object detection pipeline
detector = pipeline("object-detection", model="mcity-data-engine/fisheye8k_SenseTime_deformable-detr")

# Example image (replace with a relevant fisheye image if available, or a local path)
# Using a generic example image for demonstration purposes. For best results, use a fisheye image.
image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/bird_sized.jpg"
try:
    response = requests.get(image_url, stream=True)
    response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
    image = Image.open(BytesIO(response.content)).convert("RGB")
except requests.exceptions.RequestException as e:
    print(f"Could not load example image from URL: {e}. Please provide a local image path.")
    # Fallback/exit if image cannot be loaded
    exit()

# Perform inference
predictions = detector(image)

# Print detected objects
for pred in predictions:
    print(f"Label: {pred['label']}, Score: {pred['score']:.2f}, Box: {pred['box']}")

# For visualization (optional, requires matplotlib):
# from matplotlib import pyplot as plt
# import matplotlib.patches as patches
#
# fig, ax = plt.subplots(1)
# ax.imshow(image)
#
# for p in predictions:
#     box = p['box']
#     rect = patches.Rectangle((box['xmin'], box['ymin']), box['xmax'] - box['xmin'], box['ymax'] - box['ymin'],
#                              linewidth=1, edgecolor='r', facecolor='none')
#     ax.add_patch(rect)
#     plt.text(box['xmin'], box['ymin'] - 5, f"{p['label']}: {p['score']:.2f}", color='red', fontsize=8)
#
# plt.show()

Training and evaluation data

This model was fine-tuned on the Fisheye8K dataset. The Fisheye8K dataset comprises images captured from fisheye cameras, featuring annotated instances of common road users such as cars, buses, bikes, trucks, and pedestrians. The training process leveraged the capabilities of the Mcity Data Engine, which facilitates iterative model improvement and open-vocabulary data selection, especially for Intelligent Transportation Systems (ITS) applications.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 0
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • num_epochs: 36
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.8943 1.0 5288 1.5330
0.7865 2.0 10576 1.4108
0.7238 3.0 15864 1.2660
0.6657 4.0 21152 1.2084
0.646 5.0 26440 1.2666
0.6269 6.0 31728 1.2555
0.6049 7.0 37016 1.2350
0.5894 8.0 42304 1.2940
0.5484 9.0 47592 1.2335

Framework versions

  • Transformers 4.48.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mcity-data-engine/fisheye8k_SenseTime_deformable-detr

Finetuned
(10)
this model

Dataset used to train mcity-data-engine/fisheye8k_SenseTime_deformable-detr