-
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
Paper • 2506.01943 • Published • 25 -
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks
Paper • 2506.00411 • Published • 31 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 142
Ron Zhu
RzZ
AI & ML interests
None yet
Organizations
None yet
VLM
-
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Paper • 2312.15715 • Published • 21 -
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Paper • 2505.23747 • Published • 68 -
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 38 -
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 158
Robotic
-
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
Paper • 2506.01943 • Published • 25 -
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks
Paper • 2506.00411 • Published • 31 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 142
VLM
-
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Paper • 2312.15715 • Published • 21 -
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Paper • 2505.23747 • Published • 68 -
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 38 -
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 158
models
11
RzZ/Qwen2.5-VL-3B-GGUF
3B
•
Updated
•
36
RzZ/Qwen2.5-VL-32B-Instruct-GGUF
0.7B
•
Updated
•
13
RzZ/sd-v1-4-adapter-seg
Updated
•
5
RzZ/sd-v1-4-adapter-depth
Updated
•
7
RzZ/sd-v1-4-adapter-keypose
Updated
•
7
RzZ/sd-v1-4-adapter-color
Updated
•
8
RzZ/sd-v1-4-adapter-canny
Updated
•
4
RzZ/sd-v1-4-adapter-sketch
Updated
•
3
RzZ/sd-v1-4-adapter-openpose
Updated
•
8
RzZ/sd-v1-4-adapter-keypose-depth
Updated
•
6