Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.12250

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

about 19 hours ago

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 179
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 29

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 67
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 115
Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography

Paper • 2501.08970 • Published Jan 15 • 6

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 71
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10, 2024 • 29
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published Jun 6, 2024 • 30
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 41

lshort-transformers

Papers useful when writing the paper: "The Not So Short Transfromers"

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6, 2024 • 66
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26, 2024 • 74
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 65

Running

MCP

2.84k

2.84k

Anycoder

🏢

Generate Gradio app code from descriptions
Running

274

274

Qwen2.5 Coder Artifacts

🐢

Generate code snippets based on user input
Running

923

923

QwQ-32B-Preview

🔍

QwQ-32B-Preview
Running on CPU Upgrade

13.6k

13.6k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 146
Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 18
GLU Variants Improve Transformer

Paper • 2002.05202 • Published Feb 12, 2020 • 4
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 148

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158
Explaining Text Similarity in Transformer Models

Paper • 2405.06604 • Published May 10, 2024

Models and Linearity

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Running

MCP

2.84k

2.84k

Anycoder

🏢

Generate Gradio app code from descriptions
Running

274

274

Qwen2.5 Coder Artifacts

🐢

Generate code snippets based on user input
Running

923

923

QwQ-32B-Preview

🔍

QwQ-32B-Preview
Running on CPU Upgrade

13.6k

13.6k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

about 19 hours ago

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 179
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 29

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 146
Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 18
GLU Variants Improve Transformer

Paper • 2002.05202 • Published Feb 12, 2020 • 4
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 148

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 67
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 115
Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography

Paper • 2501.08970 • Published Jan 15 • 6

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158
Explaining Text Similarity in Transformer Models

Paper • 2405.06604 • Published May 10, 2024

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 71
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10, 2024 • 29
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published Jun 6, 2024 • 30
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 41

Models and Linearity

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41

lshort-transformers

Papers useful when writing the paper: "The Not So Short Transfromers"

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6, 2024 • 66
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26, 2024 • 74
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 65

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs