Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.08683

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Paper • 2406.17294 • Published Jun 25, 2024 • 11
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

Paper • 2406.20076 • Published Jun 28, 2024 • 10
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3, 2024 • 21

VLM Interleaved Images

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 42
SEED-Story: Multimodal Long Story Generation with Large Language Model

Paper • 2407.08683 • Published Jul 11, 2024 • 24
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

Paper • 2407.06135 • Published Jul 8, 2024 • 23
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3, 2024 • 95

How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries

Paper • 2406.12824 • Published Jun 18, 2024 • 21
Tokenization Falling Short: The Curse of Tokenization

Paper • 2406.11687 • Published Jun 17, 2024 • 16
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level

Paper • 2406.11817 • Published Jun 17, 2024 • 13

Image-Gen Story

SEED-Story: Multimodal Long Story Generation with Large Language Model

Paper • 2407.08683 • Published Jul 11, 2024 • 24
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization

Paper • 2410.06244 • Published Oct 8, 2024 • 19
Unbounded: A Generative Infinite Game of Character Life Simulation

Paper • 2410.18975 • Published Oct 24, 2024 • 37
Generative AI for Cel-Animation: A Survey

Paper • 2501.06250 • Published Jan 8 • 13

Multi-modality LVM

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Paper • 2406.12275 • Published Jun 18, 2024 • 31
TroL: Traversal of Layers for Large Language and Vision Models

Paper • 2406.12246 • Published Jun 18, 2024 • 35
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

Paper • 2406.15334 • Published Jun 21, 2024 • 9
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

Paper • 2406.12742 • Published Jun 18, 2024 • 15

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 21
Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1, 2024 • 33
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 29
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8, 2024 • 21

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Paper • 2406.17294 • Published Jun 25, 2024 • 11
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

Paper • 2406.20076 • Published Jun 28, 2024 • 10
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3, 2024 • 21

Image-Gen Story

SEED-Story: Multimodal Long Story Generation with Large Language Model

Paper • 2407.08683 • Published Jul 11, 2024 • 24
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization

Paper • 2410.06244 • Published Oct 8, 2024 • 19
Unbounded: A Generative Infinite Game of Character Life Simulation

Paper • 2410.18975 • Published Oct 24, 2024 • 37
Generative AI for Cel-Animation: A Survey

Paper • 2501.06250 • Published Jan 8 • 13

VLM Interleaved Images

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 42
SEED-Story: Multimodal Long Story Generation with Large Language Model

Paper • 2407.08683 • Published Jul 11, 2024 • 24
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

Paper • 2407.06135 • Published Jul 8, 2024 • 23
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3, 2024 • 95

Multi-modality LVM

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Paper • 2406.12275 • Published Jun 18, 2024 • 31
TroL: Traversal of Layers for Large Language and Vision Models

Paper • 2406.12246 • Published Jun 18, 2024 • 35
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

Paper • 2406.15334 • Published Jun 21, 2024 • 9
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

Paper • 2406.12742 • Published Jun 18, 2024 • 15

How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries

Paper • 2406.12824 • Published Jun 18, 2024 • 21
Tokenization Falling Short: The Curse of Tokenization

Paper • 2406.11687 • Published Jun 17, 2024 • 16
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level

Paper • 2406.11817 • Published Jun 17, 2024 • 13

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 21
Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1, 2024 • 33
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 29
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8, 2024 • 21

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs