fisheye8k_hustvl_yolos-base

This model is a fine-tuned version of hustvl/yolos-base for object detection, specifically adapted for intelligent transportation systems. It was developed as part of the Mcity Data Engine project, presented in the paper Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection.

It achieves the following results on the evaluation set:

Loss: 2.6653

Paper

This model was presented in the paper Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection.

Project Page & Code

Project Page: https://mcity.github.io/mcity_data_engine/
GitHub Repository: https://github.com/mcity/mcity_data_engine

Model description

The fisheye8k_hustvl_yolos-base model leverages the YOLOS (You Only Look at One Sequence) architecture, a vision transformer for object detection. This model has been specifically fine-tuned on the Fisheye8K dataset to enhance the detection of vehicles (Bus, Bike, Car, Truck) and pedestrians from fisheye camera imagery, which is common in intelligent transportation systems. Its development focuses on improving performance for rare and novel classes through an open-vocabulary data selection process within the Mcity Data Engine framework.

Intended uses & limitations

This model is intended for object detection in contexts related to Intelligent Transportation Systems (ITS), particularly with fisheye camera data. It is optimized for detecting road users such as Bus, Bike, Car, Pedestrian, and Truck. Its application is primarily within the iterative data selection and model training processes facilitated by the Mcity Data Engine to identify long-tail classes of interest.

Limitations: The model's performance might be optimized for specific fisheye camera distortions and may not generalize directly to standard camera views without further fine-tuning. Users should be aware that the model's performance on very rare or highly out-of-distribution classes is subject to the continuous data curation and model refinement strategies employed by the Mcity Data Engine.

How to use

You can use this model directly with the Hugging Face transformers library for object detection tasks.

from transformers import AutoImageProcessor, AutoModelForObjectDetection
from PIL import Image
import requests
import torch

# Load an example image (for illustration, consider using a fisheye image for actual use)
url = "http://images.cocodataset.org/val2017/000000039769.jpg" # Replace with your fisheye image URL or path
image = Image.open(requests.get(url, stream=True).raw)

# Load the image processor and model
image_processor = AutoImageProcessor.from_pretrained("mcity-data-engine/fisheye8k_hustvl_yolos-base")
model = AutoModelForObjectDetection.from_pretrained("mcity-data-engine/fisheye8k_hustvl_yolos-base")

# Preprocess the image
inputs = image_processor(images=image, return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)

# Post-process the outputs (bounding boxes and class logits)
target_sizes = torch.tensor([image.size[::-1]])
results = image_processor.post_process_object_detection(outputs, threshold=0.7, target_sizes=target_sizes)[0]

# Print detected objects
print(f"Detected objects in the image:")
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    print(
        f"  - {model.config.id2label[label.item()]} with confidence {round(score.item(), 3)} "
        f"at bounding box coordinates [x_min, y_min, x_max, y_max]: {box.tolist()}"
    )

Training and evaluation data

This model was fine-tuned on the Voxel51/fisheye8k dataset. This dataset is a crucial component of the iterative model improvement process facilitated by the Mcity Data Engine, focusing on ITS-relevant objects. The specific classes detected by this model are: Bus, Bike, Car, Pedestrian, and Truck.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 8
seed: 0
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 36
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
1.9357	1.0	5288	2.7182
1.8095	2.0	10576	2.6559
1.6565	3.0	15864	2.5114
1.5912	4.0	21152	2.6875
1.6169	5.0	26440	2.7796
1.5075	6.0	31728	2.6514
1.4073	7.0	37016	2.7649
1.3617	8.0	42304	2.6653

Framework versions

Transformers 4.48.3
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

Citation

If you use the Mcity Data Engine in your research, feel free to cite the project:

@article{bogdoll2025mcitydataengine,
  title={Mcity Data Engine},
  author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
  journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
  year={2025}
}

Downloads last month: 20

Model tree for mcity-data-engine/fisheye8k_hustvl_yolos-base

Base model

hustvl/yolos-base

Finetuned

(3)

this model

Dataset used to train mcity-data-engine/fisheye8k_hustvl_yolos-base

Evaluation results

Metadata error: specify a dataset to view leaderboard