1 14 5

Yuhui Zhang

yuhuizhang

https://cs.stanford.edu/~yuhuiz/

AI & ML interests

ML, CV, NLP

Recent Activity

upvoted a paper about 2 months ago

Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models

liked a dataset 2 months ago

mlfoundations/Click-100k

liked a model 2 months ago

mlfoundations/Gelato-30B-A3B

View all activity

Organizations

upvoted a paper about 2 months ago

Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models

Paper • 2511.17487 • Published Nov 21, 2025 • 10

liked a dataset 2 months ago

mlfoundations/Click-100k

Viewer • Updated Nov 11, 2025 • 101k • 627 • 14

liked a model 2 months ago

mlfoundations/Gelato-30B-A3B

Image-Text-to-Text • 31B • Updated Nov 15, 2025 • 147 • 28

updated a dataset 3 months ago

yuhuizhang/jump-dongxia

Viewer • Updated Oct 18, 2025 • 1.16k • 15

published a dataset 3 months ago

yuhuizhang/jump-dongxia

Viewer • Updated Oct 18, 2025 • 1.16k • 15

updated a dataset 7 months ago

yuhuizhang/NegVQA

Viewer • Updated Jun 9, 2025 • 7.38k • 124

published a dataset 7 months ago

yuhuizhang/NegVQA

Viewer • Updated Jun 9, 2025 • 7.38k • 124

authored a paper 9 months ago

Video Action Differencing

Paper • 2503.07860 • Published Mar 10, 2025 • 33

upvoted 2 papers 9 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 203

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis

Paper • 2503.23145 • Published Mar 29, 2025 • 35

authored a paper 10 months ago

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

Paper • 2503.13399 • Published Mar 17, 2025 • 22

liked a dataset 10 months ago

jmhb/microvqa

Viewer • Updated May 5, 2025 • 1.04k • 430 • 16

upvoted 2 papers 10 months ago

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

Paper • 2503.13399 • Published Mar 17, 2025 • 22

Video Action Differencing

Paper • 2503.07860 • Published Mar 10, 2025 • 33

liked a dataset 12 months ago

suyc21/VMCBench

Viewer • Updated Mar 3, 2025 • 9.02k • 292 • 5

upvoted a paper 12 months ago

Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published Jan 23, 2025 • 23

authored a paper 12 months ago

Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published Jan 23, 2025 • 23

upvoted a paper 12 months ago

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Paper • 2501.07171 • Published Jan 13, 2025 • 55

authored 2 papers 12 months ago

Why are Visually-Grounded Language Models Bad at Image Classification?

Paper • 2405.18415 • Published May 28, 2024

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Paper • 2501.03225 • Published Jan 6, 2025 • 7

Yuhui Zhang

AI & ML interests

Recent Activity

Organizations

yuhuizhang's activity