4 14 24

Jihan Yang PRO

jihanyang

https://jihanyang.github.io/

AI & ML interests

Computer Vision, Multimodality, Embodied AI

Recent Activity

upvoted a collection 5 days ago

VSI-SUPER

upvoted a paper 6 days ago

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

upvoted a paper 6 days ago

Cambrian-S: Towards Spatial Supersensing in Video

View all activity

Organizations

upvoted a collection 5 days ago

VSI-SUPER

Collection

VSI-SUPER benchmark proposed in Cambrian-S • 3 items • Updated 6 days ago • 3

upvoted 2 papers 6 days ago

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

Paper • 2511.04655 • Published 6 days ago • 7

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published 6 days ago • 34

upvoted a paper 16 days ago

Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Paper • 2510.23607 • Published 16 days ago • 172

upvoted 2 papers 30 days ago

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published about 1 month ago • 173

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published about 1 month ago • 161

upvoted a paper about 1 month ago

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26 • 181

upvoted a paper 3 months ago

MetaCLIP 2: A Worldwide Scaling Recipe

Paper • 2507.22062 • Published Jul 29 • 36

upvoted a paper 4 months ago

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 157

upvoted a paper 10 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123

upvoted a paper 11 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

upvoted 2 papers over 1 year ago

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24, 2024 • 63

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103

upvoted a paper almost 2 years ago

V-IRL: Grounding Virtual Intelligence in Real Life

Paper • 2402.03310 • Published Feb 5, 2024 • 16

Jihan Yang PRO

AI & ML interests

Recent Activity

Organizations

jihanyang's activity