fisheye8k_SenseTime_deformable-detr
This model is a fine-tuned version of SenseTime/deformable-detr on the Fisheye8K dataset. It was developed as part of the Mcity Data Engine project, described in the paper Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection.
The code for the Mcity Data Engine project is available on GitHub.
It achieves the following results on the evaluation set:
- Loss: 1.2335
Model description
This model is a fine-tuned object detection model based on the SenseTime/deformable-detr architecture, specifically trained for object detection on fisheye camera imagery. It is a product of the Mcity Data Engine, an open-source system designed for iterative data selection and model improvement in Intelligent Transportation Systems (ITS). The model can detect objects such as "Bus", "Bike", "Car", "Pedestrian", and "Truck", leveraging an open-vocabulary data selection process during its development to focus on rare and novel classes.
Intended uses & limitations
This model is intended for object detection tasks within Intelligent Transportation Systems (ITS) that utilize fisheye camera data. Potential applications include traffic monitoring, enhancing autonomous driving perception, and smart city infrastructure, with a focus on detecting long-tail classes of interest and vulnerable road users (VRU).
Limitations:
- The model's performance is optimized for fisheye camera data and the specific object classes it was trained on.
- Performance may vary significantly in out-of-distribution scenarios or when applied to data from different camera types or environments.
- Users should consider potential biases inherited from the underlying Fisheye8K dataset.
Sample Usage
You can use this model directly with the transformers pipeline for object detection:
from transformers import pipeline
from PIL import Image
import requests
from io import BytesIO
# Load the object detection pipeline
detector = pipeline("object-detection", model="mcity-data-engine/fisheye8k_SenseTime_deformable-detr")
# Example image (replace with a relevant fisheye image if available, or a local path)
# Using a generic example image for demonstration purposes. For best results, use a fisheye image.
image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/bird_sized.jpg"
try:
response = requests.get(image_url, stream=True)
response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
image = Image.open(BytesIO(response.content)).convert("RGB")
except requests.exceptions.RequestException as e:
print(f"Could not load example image from URL: {e}. Please provide a local image path.")
# Fallback/exit if image cannot be loaded
exit()
# Perform inference
predictions = detector(image)
# Print detected objects
for pred in predictions:
print(f"Label: {pred['label']}, Score: {pred['score']:.2f}, Box: {pred['box']}")
# For visualization (optional, requires matplotlib):
# from matplotlib import pyplot as plt
# import matplotlib.patches as patches
#
# fig, ax = plt.subplots(1)
# ax.imshow(image)
#
# for p in predictions:
# box = p['box']
# rect = patches.Rectangle((box['xmin'], box['ymin']), box['xmax'] - box['xmin'], box['ymax'] - box['ymin'],
# linewidth=1, edgecolor='r', facecolor='none')
# ax.add_patch(rect)
# plt.text(box['xmin'], box['ymin'] - 5, f"{p['label']}: {p['score']:.2f}", color='red', fontsize=8)
#
# plt.show()
Training and evaluation data
This model was fine-tuned on the Fisheye8K dataset. The Fisheye8K dataset comprises images captured from fisheye cameras, featuring annotated instances of common road users such as cars, buses, bikes, trucks, and pedestrians. The training process leveraged the capabilities of the Mcity Data Engine, which facilitates iterative model improvement and open-vocabulary data selection, especially for Intelligent Transportation Systems (ITS) applications.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 0
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 36
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.8943 | 1.0 | 5288 | 1.5330 |
| 0.7865 | 2.0 | 10576 | 1.4108 |
| 0.7238 | 3.0 | 15864 | 1.2660 |
| 0.6657 | 4.0 | 21152 | 1.2084 |
| 0.646 | 5.0 | 26440 | 1.2666 |
| 0.6269 | 6.0 | 31728 | 1.2555 |
| 0.6049 | 7.0 | 37016 | 1.2350 |
| 0.5894 | 8.0 | 42304 | 1.2940 |
| 0.5484 | 9.0 | 47592 | 1.2335 |
Framework versions
- Transformers 4.48.3
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 9
Model tree for mcity-data-engine/fisheye8k_SenseTime_deformable-detr
Base model
SenseTime/deformable-detr