--- language: en license: mit tags: - pose-estimation - computer-vision - keypoint-detection - diffusion-models - stable-diffusion - out-of-distribution - human-pose - top-down-pose-estimation - coco - mmpose library_name: pytorch --- # SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation (Body - 17 Keypoints)
[![Paper](https://img.shields.io/badge/arXiv-Paper-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2509.24980) [![Project Page](https://img.shields.io/badge/Project-Website-pink?logo=googlechrome&logoColor=white)](https://t-s-liang.github.io/SDPose) [![HuggingFace Demo](https://img.shields.io/badge/🤗%20HuggingFace-Demo-yellow)](https://huggingface.co/spaces/teemosliang/SDPose-Body) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
## Model Description **SDPose** is a state-of-the-art human pose estimation model that leverages the powerful visual priors from **Stable Diffusion** to achieve exceptional performance on out-of-distribution (OOD) scenarios. This model variant estimates **17 COCO body keypoints** including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles. ### Model Architecture SDPose employs a **U-Net backbone** initialized with Stable Diffusion v2 weights, combined with a specialized heatmap head for keypoint prediction. The model operates in a top-down manner: 1. **Person Detection**: Detect human bounding boxes using an object detector (e.g., YOLO11-x) 2. **Pose Estimation**: Crop and estimate 17 body keypoints for each detected person 3. **Heatmap Generation**: Produce confidence heatmaps for precise keypoint estimation **Model Specifications:** - **Backbone**: Stable Diffusion v2 U-Net (fine-tuned; minimal architectural changes) - **Head**: Custom heatmap prediction head - **Input Resolution**: 1024×768 (H×W) - **Output**: 17 keypoint heatmaps + coordinates with confidence scores - **Framework**: MMPose ## Supported Keypoints (COCO Format) The model predicts 17 body keypoints following the COCO keypoint format: ``` 0: nose 1: left_eye 2: right_eye 3: left_ear 4: right_ear 5: left_shoulder 6: right_shoulder 7: left_elbow 8: right_elbow 9: left_wrist 10: right_wrist 11: left_hip 12: right_hip 13: left_knee 14: right_knee 15: left_ankle 16: right_ankle ``` ## Intended Use ### Primary Use Cases - Human pose estimation in natural images - Pose estimation in artistic and stylized domains (paintings, anime, sketches) - Animation and video pose tracking - Cross-domain pose analysis and research - Applications requiring robust pose estimation under distribution shifts ## How to Use ### Installation ```bash # Clone the repository git clone https://github.com/t-s-liang/SDPose-OOD.git cd SDPose-OOD # Install dependencies pip install -r requirements.txt # Download YOLO11-x for human detection wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/ # Launch Gradio interface cd gradio_app bash launch_gradio.sh ``` ## Training Data ### Datasets Trained exclusively on COCO-2017 train2017 (no extra data). - **COCO (Common Objects in Context)**: 200K+ images with 17 body keypoints ### Preprocessing - Images are resized and cropped to 1024×768 resolution - Augmentation: random horizontal flip, half-body & bbox transforms, UDP affine; Albumentations (Gaussian/Median blur, coarse dropout). - Heatmaps: UDP codec (MMPose style). ### Comparison with Baselines SDPose significantly outperforms traditional pose estimation models (e.g., Sapiens, ViTPose++) on out-of-distribution benchmarks while maintaining competitive performance on in-domain data. See our [paper](https://arxiv.org/abs/2509.24980) for comprehensive evaluation results. ## Citation If you use SDPose in your research, please cite our paper: ```bibtex @misc{liang2025sdposeexploitingdiffusionpriors, title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation}, author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan}, year={2025}, eprint={2509.24980}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2509.24980}, } ``` ## License This model is released under the [MIT License](https://opensource.org/licenses/MIT). ## Additional Resources - 🌐 **Project Website**: [https://t-s-liang.github.io/SDPose](https://t-s-liang.github.io/SDPose) - 📄 **Paper**: [arXiv:2509.24980](https://arxiv.org/abs/2509.24980) - 💻 **Code Repository**: [GitHub](https://github.com/t-s-liang/SDPose-OOD) - 🤗 **Demo**: [HuggingFace Space](https://huggingface.co/spaces/teemosliang/SDPose-Body) - 📧 **Contact**: tsliang2001@gmail.com ---
**⭐ Star us on GitHub — it motivates us a lot!**