---
library_name: transformers
tags:
- vision
- image-segmentation
- universal-segmentation
- korean-road
- oneformer
- distillation
- aihub
license: cc-by-4.0
model_name: KoalaSeg-Edge-ViT
---
# KoalaSeg-Edge-ViT π¨π£οΈ
**KoalaSeg = _KOrean lAyered assistive Segmentation_**
νκ΅ λλ‘βΒ·β보ν νκ²½ μ μ© **Universal Segmentation** λͺ¨λΈμ
λλ€.
3-μ€ λ μ΄μ΄ λ§μ€ν¬(XML ν΄λ¦¬κ³€ βΆ AIHUB λλ‘보ν λ°μ΄ν°λ‘ νμ΅ν νκ΅ λλ‘ μ μ© λͺ¨λΈ βΆ OneFormer-Cityscapes)λ₯Ό κ²Ήμ³ λ§λ ν΅ν© GTλ‘ νμΈνλν OneFormer Edge-ViT νμ λ²μ μ
λλ€.
---
## Model Details
| νλͺ© | λ΄μ© |
|------|------|
| **Developed by** | Team RoadSight |
| **Model type** | Edge-ViT backbone + OneFormer head
(semantic-only task token) |
| **Finetuned from** | `shi-labs/oneformer_cityscapes_swin_large` |
| **Framework** | π€ Transformers v4.41 / PyTorch 2.3 |
| **License** | CC BY 4.0 |
---
## Training Data
| μΆμ² | μλ | μ£Όμ λ°©μ |
|------|------|-----------|
| **AIHUB λλ‘·보ννκ²½**
(λλ‘ μ°¨μ , μΈλ, ν‘λ¨λ³΄λ) | 5 615 μ₯ | 곡μ pixel-wise GT |
| μκ° μ΄¬μ μ§λ°©λ | 9 042 μ₯ | CVAT XML ν΄λ¦¬κ³€ |
| Street View νμ | 3 712 μ₯ | OneFormer-Cityscapes pseudo-mask |
| **μ΄ν©** | **18 369 μ₯** | 3-μ€ λ μ΄μ΄ ν©μ± β Morph Open/Close + MedianBlur(17 px) |
---
## Speeds & Sizes *(512 Γ 512 batch 1)*
| Device | Baseline Cityscapes | Ensemble(3-λ μ΄μ΄) | Custom(K-Road) | **KoalaSeg(ft)** |
|--------|--------------------|-------------------|---------------|------------------|
| **A100** | 3.58 s β 0.28 FPS | 3.74 s β 0.27 FPS | 0.15 s β 6.67 FPS | **0.14 s β 7.25 FPS** |
| **T4** | 5.61 s β 0.18 FPS | 6.01 s β 0.17 FPS | 0.39 s β 2.60 FPS | **0.31 s β 3.27 FPS** |
| **CPU (i9-12900K)** | 124 s | 150 s | 26.6 s | **18.4 s** |
---
## Evaluation (κ΅λ΄ ν
μ€νΈμ
)
| Metric | Baseline | **KoalaSeg** |
|--------|----------|--------------|
| mIoU (μ 체 ν΄λμ€) | 0.55 | **0.81** |
| F1 β λλ‘ vs μΈλ | 0.58 | **0.89** |
---
## Quick Start
```python
from transformers import AutoProcessor, AutoModelForUniversalSegmentation
import torch, numpy as np, matplotlib.pyplot as plt
from PIL import Image
model_id = "roadsight/KoalaSeg-Edge-ViT"
proc = AutoProcessor.from_pretrained(model_id)
model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda")
img = Image.open("korean_road.jpg").convert("RGB")
inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda")
with torch.no_grad():
out = model(**inputs)
idmap = proc.post_process_semantic_segmentation(out, target_sizes=[img.size[::-1]])[0]
plt.imshow(idmap.cpu()); plt.axis("off"); plt.show()