YOLO11n Text

A fine-tuned YOLO11n model for detecting text regions in images. This model is optimized for detecting text bounding boxes in documents, screenshots, UI interfaces, and natural scene images.

Model Description

This model is based on Ultralytics YOLO11n (nano variant) and has been fine-tuned specifically for text detection tasks. It detects text regions as bounding boxes, which can be used as input for OCR pipelines or UI automation tasks.

Model Architecture

  • Base Model: YOLO11n (nano)
  • Parameters: 2,590,035
  • Layers: 181
  • Input Size: 640x640
  • Classes: 1 (text)

Training Details

Dataset

Training Configuration

Parameter Value
Epochs 50
Batch Size 16
Image Size 640
Optimizer SGD (auto)
Learning Rate 0.01 โ†’ 0.0003
Momentum 0.937
Weight Decay 0.0005
Warmup Epochs 3.0
AMP Enabled
Workers 8

Augmentation

Augmentation Value
HSV Hue 0.015
HSV Saturation 0.7
HSV Value 0.4
Translation 0.1
Scale 0.5
Horizontal Flip 0.5
Mosaic 1.0
Erasing 0.4
Auto Augment randaugment

Hardware

  • GPU: NVIDIA GeForce RTX 5070 Ti (16GB VRAM)
  • Training Time: ~1.75 hours (6,267 seconds)
  • Framework: Ultralytics 8.3.240, PyTorch 2.9.1+cu128

Performance Metrics

Final Results (Epoch 50)

Metric Value
Precision 95.7%
Recall 93.6%
mAP@50 97.6%
mAP@50-95 81.8%
Box Loss 0.619
Class Loss 0.376
DFL Loss 0.828

Training Progress

Epoch mAP@50 mAP@50-95 Precision Recall
1 89.1% 64.3% 86.0% 82.7%
10 95.9% 76.8% 93.5% 90.7%
20 96.9% 79.5% 94.8% 92.0%
30 97.3% 80.8% 95.1% 93.1%
40 97.6% 81.5% 95.6% 93.5%
50 97.6% 81.8% 95.7% 93.6%

Usage

Installation

pip install ultralytics

Inference

from ultralytics import YOLO

# Load the model
model = YOLO("best.pt")

# Run inference
results = model.predict(
    source="image.jpg",
    conf=0.25,
    iou=0.7,
    imgsz=640
)

# Process results
for result in results:
    boxes = result.boxes
    for box in boxes:
        # Get bounding box coordinates (x1, y1, x2, y2)
        xyxy = box.xyxy[0].tolist()
        confidence = box.conf[0].item()
        print(f"Text box: {xyxy}, confidence: {confidence:.2f}")

Batch Processing

from ultralytics import YOLO
from pathlib import Path

model = YOLO("best.pt")

# Process folder of images
results = model.predict(
    source="path/to/images/",
    conf=0.25,
    save=True,  # Save annotated images
    save_txt=True  # Save YOLO format labels
)

Export to Other Formats

from ultralytics import YOLO

model = YOLO("best.pt")

# Export to ONNX
model.export(format="onnx", imgsz=640, simplify=True)

# Export to TensorRT (for NVIDIA GPUs)
model.export(format="engine", imgsz=640, half=True)

# Export to CoreML (for Apple devices)
model.export(format="coreml", imgsz=640)

Model Files

File Description
best.pt Best checkpoint (highest mAP@50)
args.yaml Training configuration
results.csv Training metrics per epoch
results.png Training curves visualization
confusion_matrix.png Confusion matrix
BoxPR_curve.png Precision-Recall curve

Recommended Inference Parameters

Parameter Recommended Description
conf 0.25 Confidence threshold
iou 0.7 NMS IoU threshold
imgsz 640-1024 Input image size
max_det 300 Maximum detections per image

Use Cases

  • OCR Preprocessing: Detect text regions before applying OCR
  • Document Analysis: Locate text areas in scanned documents
  • UI Automation: Find text elements in application screenshots
  • Scene Text Detection: Detect text in natural images
  • PDF Processing: Extract text region locations

Limitations

  • Optimized for horizontal text; may have reduced accuracy on rotated text
  • Trained primarily on document and UI images
  • Single class (text) - does not distinguish between text types
  • Best performance at 640px input size

Citation

@software{yolo11n_text,
  author = {Ultralytics},
  title = {YOLO11n Text},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/datasets/DonkeySmall/Yolo-Text-Detection}
}

@software{ultralytics_yolo,
  author = {Jocher, Glenn and Chaurasia, Ayush and Qiu, Jing},
  title = {Ultralytics YOLO},
  year = {2023},
  publisher = {GitHub},
  url = {https://github.com/ultralytics/ultralytics}
}

License

This model is released under the Apache 2.0 License.

Acknowledgments

Downloads last month
40
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train RoyRud1902/yolo11n-text

Evaluation results