robbyant
/

lingbot-depth-postrain-dc-vitl14

+---
+license: apache-2.0
+language:
+- en
+tags:
+- depth-estimation
+- depth-completion
+- rgb-d
+- computer-vision
+- robotics
+- 3d-vision
+- pytorch
+- vision-transformer
+datasets:
+- custom
+metrics:
+- rmse
+- mae
+library_name: pytorch
+pipeline_tag: depth-estimation
+---
+# LingBot-Depth-DC (Depth Completion)
+**LingBot-Depth-DC** is a post-trained variant of LingBot-Depth, specifically **optimized for sparse depth completion** tasks. This model excels at recovering dense depth maps from highly sparse inputs such as SfM/SLAM point clouds.
+## Model Details
+### Model Description
+This model builds upon the LingBot-Depth pretrained checkpoint with additional post-training focused on sparse depth completion scenarios. It is particularly effective for:
+- Recovering complete depth from sparse SfM/SLAM observations
+- Handling extremely sparse depth inputs (e.g., <5% valid pixels)
+- Scenarios where depth sensors are unavailable and only sparse geometric cues exist
+- **Developed by:** Bin Tan, Changjiang Sun, Xiage Qin, Hanat Adai, Zelin Fu, Tianxiang Zhou, Han Zhang, Yinghao Xu, Xing Zhu, Yujun Shen, Nan Xue
+- **Model type:** Vision Transformer for sparse depth completion
+- **License:** Apache 2.0
+- **Finetuned from model:** LingBot-Depth (pretrained)
+### Model Sources
+- **Repository:** https://github.com/robbyant/lingbot-depth
+- **Paper:** [Masked Depth Modeling for Spatial Perception](https://arxiv.org/abs/2601.xxxxx)
+- **Project Page:** https://technology.robbyant.com/lingbot-depth
+### Related Models
+| Model | Repository | Description |
+|-------|------------|-------------|
+| LingBot-Depth | [robbyant/lingbot-depth-pretrain-vitl-14](https://huggingface.co/robbyant/lingbot-depth-pretrain-vitl-14) | General-purpose depth refinement |
+| LingBot-Depth-DC | [robbyant/lingbot-depth-postrain-dc-vitl14](https://huggingface.co/robbyant/lingbot-depth-postrain-dc-vitl14) | Optimized for sparse depth completion (this model) |
+## Uses
+### Direct Use
+- **Sparse Depth Completion**: Recovering dense depth from SfM/SLAM sparse point clouds
+- **Extreme Sparsity Handling**: Working with <5% valid depth pixels
+- **RGB-guided Depth Densification**: Using visual context to fill large missing regions
+### Downstream Use
+- **SLAM Enhancement**: Densifying sparse SLAM outputs for better scene understanding
+- **Novel View Synthesis**: Providing dense geometry for view synthesis pipelines
+- **3D Reconstruction**: Completing sparse depth for mesh reconstruction
+- **Robotics Navigation**: Dense depth from sparse sensor observations
+## Technical Specifications
+### Model Architecture
+- **Encoder:** ViT-Large/14 (24 layers) with separated patch embeddings for RGB and depth
+- **Decoder:** ConvStack decoder with hierarchical upsampling
+- **Objective:** Masked depth modeling optimized for sparse inputs
+- **Model size:** ~300M parameters
+### Software Requirements
+- Python >= 3.9
+- PyTorch >= 2.0.0
+- xformers
+## Citation
+```bibtex
+@article{lingbot-depth2026,
+  title={Masked Depth Modeling for Spatial Perception},
+  author={Tan, Bin and Sun, Changjiang and Qin, Xiage and Adai, Hanat and Fu, Zelin and Zhou, Tianxiang and Zhang, Han and Xu, Yinghao and Zhu, Xing and Shen, Yujun and Xue, Nan},
+  journal={arXiv preprint arXiv:2601.xxxxx},
+  year={2026}
+}
+```
+## Model Card Contact
+- **Email:** tanbin.tan@antgroup.com, xuenan.xue@antgroup.com
+- **Issues:** https://github.com/robbyant/lingbot-depth/issues

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a4512e05445857d1404fc00227285bd8fa4abdd97250dc65f9636aa9cc71325
+size 1284841739