Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
pangzs 's Collections
Papers
In-context learning

Papers

updated Aug 21
Upvote
-

  • Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

    Paper • 2505.04921 • Published May 8 • 185

  • On Path to Multimodal Generalist: General-Level and General-Bench

    Paper • 2505.04620 • Published May 7 • 82

  • StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

    Paper • 2505.05467 • Published May 8 • 14

  • Adapting Vision-Language Models Without Labels: A Comprehensive Survey

    Paper • 2508.05547 • Published Aug 7 • 11

  • VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

    Paper • 2508.02095 • Published Aug 4 • 9

  • Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

    Paper • 2508.13167 • Published Aug 6 • 127

  • Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

    Paper • 2508.09789 • Published Aug 13 • 5

  • MedSAMix: A Training-Free Model Merging Approach for Medical Image Segmentation

    Paper • 2508.11032 • Published Aug 14 • 2

  • Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

    Paper • 2508.09736 • Published Aug 13 • 56

  • Ovis2.5 Technical Report

    Paper • 2508.11737 • Published Aug 15 • 109
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs