Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval Paper • 2507.23284 • Published Jul 31 • 3
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models Paper • 2503.19355 • Published Mar 25 • 2
LLaMo: Large Language Model-based Molecular Graph Assistant Paper • 2411.00871 • Published Oct 31, 2024 • 22