Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning Paper • 2310.00647 • Published Oct 1, 2023
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 147
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 147
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 147
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 147
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Paper • 2306.04488 • Published Jun 7, 2023 • 2
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs Paper • 2405.16700 • Published May 26, 2024
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10 • 30
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21, 2024 • 47
Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning Paper • 2209.09203 • Published Sep 16, 2022
Solving robust MDPs as a sequence of static RL problems Paper • 2410.06212 • Published Oct 8, 2024
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 133
What matters when building vision-language models? Paper • 2405.02246 • Published May 3, 2024 • 103
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Paper • 2403.09029 • Published Mar 14, 2024 • 56
eP-ALM: Efficient Perceptual Augmentation of Language Models Paper • 2303.11403 • Published Mar 20, 2023 • 3