--- library_name: transformers tags: - vision - image-segmentation - universal-segmentation - korean-road - oneformer - distillation - aihub license: cc-by-4.0 model_name: KoalaSeg-Edge-ViT --- # KoalaSeg-Edge-ViT πŸ¨πŸ›£οΈ **KoalaSeg = _KOrean lAyered assistive Segmentation_** ν•œκ΅­ λ„λ‘œβ€†Β·β€†λ³΄ν–‰ ν™˜κ²½ μ „μš© **Universal Segmentation** λͺ¨λΈμž…λ‹ˆλ‹€. 3-쀑 λ ˆμ΄μ–΄ 마슀크(XML 폴리곀 β–Ά AIHUB λ„λ‘œλ³΄ν–‰ λ°μ΄ν„°λ‘œ ν•™μŠ΅ν•œ ν•œκ΅­ λ„λ‘œ μ „μš© λͺ¨λΈ β–Ά OneFormer-Cityscapes)λ₯Ό 겹쳐 λ§Œλ“  톡합 GT둜 νŒŒμΈνŠœλ‹ν•œ OneFormer Edge-ViT 학생 λ²„μ „μž…λ‹ˆλ‹€. --- ## Model Details | ν•­λͺ© | λ‚΄μš© | |------|------| | **Developed by** | Team RoadSight | | **Model type** | Edge-ViT backbone + OneFormer head
(semantic-only task token) | | **Finetuned from** | `shi-labs/oneformer_cityscapes_swin_large` | | **Framework** | πŸ€— Transformers v4.41 / PyTorch 2.3 | | **License** | CC BY 4.0 | --- ## Training Data | 좜처 | μˆ˜λŸ‰ | 주석 방식 | |------|------|-----------| | **AIHUB λ„λ‘œΒ·λ³΄ν–‰ν™˜κ²½**
(λ„λ‘œ μ°¨μ„ , 인도, νš‘λ‹¨λ³΄λ„) | 5 615 μž₯ | 곡식 pixel-wise GT | | μžκ°€ 촬영 지방도 | 9 042 μž₯ | CVAT XML 폴리곀 | | Street View νŒŒμƒ | 3 712 μž₯ | OneFormer-Cityscapes pseudo-mask | | **총합** | **18 369 μž₯** | 3-쀑 λ ˆμ΄μ–΄ ν•©μ„± β†’ Morph Open/Close + MedianBlur(17 px) | --- ## Speeds & Sizes *(512 Γ— 512 batch 1)* | Device | Baseline Cityscapes | Ensemble(3-λ ˆμ΄μ–΄) | Custom(K-Road) | **KoalaSeg(ft)** | |--------|--------------------|-------------------|---------------|------------------| | **A100** | 3.58 s β†’ 0.28 FPS | 3.74 s β†’ 0.27 FPS | 0.15 s β†’ 6.67 FPS | **0.14 s β†’ 7.25 FPS** | | **T4** | 5.61 s β†’ 0.18 FPS | 6.01 s β†’ 0.17 FPS | 0.39 s β†’ 2.60 FPS | **0.31 s β†’ 3.27 FPS** | | **CPU (i9-12900K)** | 124 s | 150 s | 26.6 s | **18.4 s** | --- ## Evaluation (κ΅­λ‚΄ ν…ŒμŠ€νŠΈμ…‹) | Metric | Baseline | **KoalaSeg** | |--------|----------|--------------| | mIoU (전체 클래슀) | 0.55 | **0.81** | | F1 – λ„λ‘œ vs 인도 | 0.58 | **0.89** | --- ## Quick Start ```python from transformers import AutoProcessor, AutoModelForUniversalSegmentation import torch, numpy as np, matplotlib.pyplot as plt from PIL import Image model_id = "roadsight/KoalaSeg-Edge-ViT" proc = AutoProcessor.from_pretrained(model_id) model = AutoModelForUniversalSegmentation.from_pretrained(model_id).to("cuda") img = Image.open("korean_road.jpg").convert("RGB") inputs = proc(images=img, task_inputs=["semantic"], return_tensors="pt").to("cuda") with torch.no_grad(): out = model(**inputs) idmap = proc.post_process_semantic_segmentation(out, target_sizes=[img.size[::-1]])[0] plt.imshow(idmap.cpu()); plt.axis("off"); plt.show()