Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
Joya Chen PRO
chenjoya
AI & ML interests
Video LLM
Recent Activity
upvoted
a
paper
3 days ago
Grounding Computer Use Agents on Human Demonstrations
upvoted
a
paper
8 days ago
Cambrian-S: Towards Spatial Supersensing in Video
upvoted
a
paper
10 days ago
Revisiting Multimodal Positional Encoding in Vision-Language Models