CC Lin's picture

3

CC Lin

cclin10

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 27 days ago

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

authored a paper 3 months ago

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

authored a paper 3 months ago

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

View all activity

Organizations

None yet

authored 11 papers 3 months ago

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

Paper • 2406.06890 • Published Jun 11, 2024 • 1

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

Paper • 2407.10937 • Published Jul 15, 2024 • 1

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

Paper • 2408.00765 • Published Aug 1, 2024 • 14

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Paper • 2410.23277 • Published Oct 30, 2024 • 9

GenXD: Generating Any 3D and 4D Scenes

Paper • 2411.02319 • Published Nov 4, 2024 • 20

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

Paper • 2007.15796 • Published Jul 31, 2020

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

Paper • 2102.05775 • Published Feb 10, 2021

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Paper • 2504.07934 • Published Apr 10 • 20

Audio-Aware Large Language Models as Judges for Speaking Styles

Paper • 2506.05984 • Published Jun 6 • 15

ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

Paper • 2506.10128 • Published Jun 11 • 22

STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models

Paper • 2507.15375 • Published Jul 21 • 30

authored 8 papers over 1 year ago

DisCo: Disentangled Control for Referring Human Dance Generation in Real World

Paper • 2307.00040 • Published Jun 30, 2023 • 25

Equivariant Similarity for Vision-Language Foundation Models

Paper • 2303.14465 • Published Mar 25, 2023

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

Paper • 2206.07160 • Published Jun 14, 2022

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

Paper • 2111.13196 • Published Nov 25, 2021

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

Paper • 2309.17421 • Published Sep 29, 2023 • 4

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

Paper • 2310.08541 • Published Oct 12, 2023 • 18

MM-VID: Advancing Video Understanding with GPT-4V(ision)

Paper • 2310.19773 • Published Oct 30, 2023 • 20

Adaptive Human Matting for Dynamic Videos

Paper • 2304.06018 • Published Apr 12, 2023