Abstract

To compete with existing mobile architectures, MobileViG introduces Sparse Vision Graph Attention (SVGA), a fast token-mixing operator based on the principles of GNNs. However, MobileViG scales poorly with model size, falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC), a new vision graph neural network (ViG) module that solves this scaling problem. Our proposed mobile vision architecture, MobileViGv2, uses MGC to demonstrate the effectiveness of our approach. MGC improves on SVGA by increasing graph sparsity and introducing conditional positional encodings to the graph operation. Our smallest model, MobileViGv2-Ti, achieves a 77.7% top-1 accuracy on ImageNet-1K, 2% higher than MobileViG-Ti, with 0.9 ms inference latency on the iPhone 13 Mini NPU. Our largest model, MobileViGv2-B, achieves an 83.4% top-1 accuracy, 0.8% higher than MobileViG-B, with 2.7 ms inference latency. Besides image classification, we show that MobileViGv2 generalizes well to other tasks. For object detection and instance segmentation on MS COCO 2017, MobileViGv2-M outperforms MobileViG-M by 1.2 AP^{box} and 0.7 AP^{mask}, and MobileViGv2-B outperforms MobileViG-B by 1.0 AP^{box} and 0.7 AP^{mask}. For semantic segmentation on ADE20K, MobileViGv2-M achieves 42.9% mIoU and MobileViGv2-B achieves 44.3% mIoU.

Code

Repo: https://github.com/SLDGroup/MobileViGv2

Citation

If our code or models help your work, please cite MobileViG (CVPRW 2023), MobileViGv2 (CVPRW 2024), and GreedyViG (CVPR 2024):

@InProceedings{MobileViGv2_2024,
    author    = {Avery, William and Munir, Mustafa and Marculescu, Radu},
    title     = {Scaling Graph Convolutions for Mobile Vision},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2024},
    pages     = {5857-5865}
}

@InProceedings{mobilevig2023,
    author    = {Munir, Mustafa and Avery, William and Marculescu, Radu},
    title     = {MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2023},
    pages     = {2211-2219}
}

@InProceedings{GreedyViG_2024_CVPR,
    author    = {Munir, Mustafa and Avery, William and Rahman, Md Mostafijur and Marculescu, Radu},
    title     = {GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {6118-6127}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support