MMLab@NTU

university

https://www.mmlab-ntu.com/

MMLabNTU

Activity Feed Request to join this org

AI & ML interests

Computer Vision and Deep Learning

Recent Activity

yumingj authored a paper 5 days ago

RynnVLA-002: A Unified Vision-Language-Action and World Model

Ziqi authored a paper 11 days ago

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Ziqi authored a paper 11 days ago

RealDPO: Real or Not Real, that is the Preference

View all activity

Papers

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

View all Papers

yumingj

authored a paper 5 days ago

RynnVLA-002: A Unified Vision-Language-Action and World Model

Paper • 2511.17502 • Published 7 days ago • 22

Ziqi

authored 3 papers 11 days ago

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Paper • 2510.13759 • Published Oct 15 • 9

RealDPO: Real or Not Real, that is the Preference

Paper • 2510.14955 • Published Oct 16 • 6

Simulating the Visual World with Artificial Intelligence: A Roadmap

Paper • 2511.08585 • Published 17 days ago • 29

ldkong

authored a paper 24 days ago

3EED: Ground Everything Everywhere in 3D

Paper • 2511.01755 • Published 25 days ago • 10

Ziqi

authored a paper 29 days ago

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

Paper • 2510.26794 • Published 29 days ago • 26

ldkong

authored 3 papers about 1 month ago

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Paper • 2510.02240 • Published Oct 2 • 17

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23 • 55

VideoLucy: Deep Memory Backtracking for Long Video Understanding

Paper • 2510.12422 • Published Oct 14 • 1

Ziqi

authored 8 papers about 2 months ago

VBench: Comprehensive Benchmark Suite for Video Generative Models

Paper • 2311.17982 • Published Nov 29, 2023 • 9

Talk-to-Edit: Fine-Grained Facial Editing via Dialog

Paper • 2109.04425 • Published Sep 9, 2021

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

Paper • 2501.08453 • Published Jan 14 • 1

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Paper • 2506.21356 • Published Jun 26 • 22

Cut2Next: Generating Next Shot via In-Context Tuning

Paper • 2508.08244 • Published Aug 11 • 13

CineScale: Free Lunch in High-Resolution Cinematic Visual Generation

Paper • 2508.15774 • Published Aug 21 • 20

Stencil: Subject-Driven Generation with Context Guidance

Paper • 2509.17120 • Published Sep 21 • 6

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

Paper • 2510.05094 • Published Oct 6 • 37

luodian

authored 3 papers about 2 months ago

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

Paper • 2411.15296 • Published Nov 22, 2024 • 21

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23 • 25

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

Paper • 2509.23661 • Published Sep 28 • 44