Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2509.09676

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

vrgamedevgirl84/Wan14BT2VFusioniX

Text-to-Video • Updated Jun 21 • 584
TheStageAI/Elastic-mochi-1-preview

Text-to-Video • Updated 21 days ago • 5 • 2
nesaorg/animatediff-base

Text-to-Video • Updated Jun 22 • 48
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

Paper • 2506.18839 • Published Jun 18 • 12

Towards Special AI

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Paper • 2509.09676 • Published Sep 11 • 31
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

Paper • 2503.13111 • Published Mar 17 • 7

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Paper • 2508.14879 • Published Aug 20 • 65
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26 • 41
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published Aug 20 • 36
Multi-View 3D Point Tracking

Paper • 2508.21060 • Published Aug 28 • 22

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 21
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 17
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 16
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Paper • 2509.09676 • Published Sep 11 • 31

Google Gemma-3-270m model variants on different fine-tuning

sweelol/gemma3-270m-pruned-base

Text Generation • 0.3B • Updated Aug 26 • 15
sweelol/finetuned-pruned-gemma3-270m-dolly

Text Generation • 0.4B • Updated Aug 26 • 13
sweelol/lora-gemma3-270m-dolly

Text Generation • Updated Aug 26 • 30
Sweelol-ai/pt-gemma3-270m-dolly

Text Generation • Updated Aug 26 • 31

Data and other things

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Paper • 2412.14475 • Published Dec 19, 2024 • 55
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 52
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
WavePulse: Real-time Content Analytics of Radio Livestreams

Paper • 2412.17998 • Published Dec 23, 2024 • 11

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 21
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

Paper • 2406.14347 • Published Jun 20, 2024 • 102
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Paper • 2509.09676 • Published Sep 11 • 31

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

vrgamedevgirl84/Wan14BT2VFusioniX

Text-to-Video • Updated Jun 21 • 584
TheStageAI/Elastic-mochi-1-preview

Text-to-Video • Updated 21 days ago • 5 • 2
nesaorg/animatediff-base

Text-to-Video • Updated Jun 22 • 48
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

Paper • 2506.18839 • Published Jun 18 • 12

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Paper • 2509.09676 • Published Sep 11 • 31

Towards Special AI

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Paper • 2509.09676 • Published Sep 11 • 31
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

Paper • 2503.13111 • Published Mar 17 • 7

Google Gemma-3-270m model variants on different fine-tuning

sweelol/gemma3-270m-pruned-base

Text Generation • 0.3B • Updated Aug 26 • 15
sweelol/finetuned-pruned-gemma3-270m-dolly

Text Generation • 0.4B • Updated Aug 26 • 13
sweelol/lora-gemma3-270m-dolly

Text Generation • Updated Aug 26 • 30
Sweelol-ai/pt-gemma3-270m-dolly

Text Generation • Updated Aug 26 • 31

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Paper • 2508.14879 • Published Aug 20 • 65
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26 • 41
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published Aug 20 • 36
Multi-View 3D Point Tracking

Paper • 2508.21060 • Published Aug 28 • 22

Data and other things

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Paper • 2412.14475 • Published Dec 19, 2024 • 55
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 52
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
WavePulse: Real-time Content Analytics of Radio Livestreams

Paper • 2412.17998 • Published Dec 23, 2024 • 11

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 21
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 17
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 16
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 21
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

Paper • 2406.14347 • Published Jun 20, 2024 • 102
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Paper • 2509.09676 • Published Sep 11 • 31

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs