World-in-World: World Models in a Closed-Loop World Paper • 2510.18135 • Published 11 days ago • 85
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Paper • 2507.07104 • Published Jul 9 • 45
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published Jul 7 • 74
Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning Paper • 2506.02327 • Published Jun 2 • 20
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens Paper • 2501.07730 • Published Jan 13 • 18
VideoAuteur: Towards Long Narrative Video Generation Paper • 2501.06173 • Published Jan 10 • 33
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published Dec 19, 2024 • 28
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark Paper • 2412.07825 • Published Dec 10, 2024 • 12
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 694
SpatialTracker: Tracking Any 2D Pixels in 3D Space Paper • 2404.04319 • Published Apr 5, 2024 • 25
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11, 2024 • 59
COCONut Dataset Collection This is a collection of COCONut datasets accepted at CVPR2024 • 3 items • Updated Apr 29, 2024 • 6
ViTamin: Designing Scalable Vision Models in the Vision-Language Era Paper • 2404.02132 • Published Apr 2, 2024 • 2
ViTamin Family Collection Designing Scalable Vision Models in the Vision-language Era. The best performing model is 'jienengchen/ViTamin-XL-384px'. • 16 items • Updated Apr 11, 2024 • 8
Foundation AI Papers Collection Curated List of Must-Reads on LLM reasoning at Temus AI team • 135 items • Updated Jun 15, 2024 • 35