ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding Paper • 2212.05171 • Published Dec 10, 2022
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets Paper • 2406.18518 • Published Jun 26, 2024 • 24
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 100
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning Paper • 2402.15506 • Published Feb 23, 2024 • 18
ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding Paper • 2305.08275 • Published May 14, 2023 • 2
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild Paper • 2305.11147 • Published May 18, 2023 • 3
REX: Rapid Exploration and eXploitation for AI Agents Paper • 2307.08962 • Published Jul 18, 2023 • 1
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Paper • 2308.02151 • Published Aug 4, 2023 • 20
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents Paper • 2308.05960 • Published Aug 11, 2023 • 19
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation Paper • 2303.04991 • Published Mar 9, 2023
Align and Prompt: Video-and-Language Pre-training with Entity Prompts Paper • 2112.09583 • Published Dec 17, 2021
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning Paper • 2311.18799 • Published Nov 30, 2023 • 1