Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.14860

🔍 Interpretability & Analysis of LMs

Outstanding research in LM interpretability and evaluation, summarized

Latent Reasoning in LLMs as a Vocabulary-Space Superposition

Paper • 2510.15522 • Published 11 days ago • 1
Language Models are Injective and Hence Invertible

Paper • 2510.15511 • Published 11 days ago • 50
Eliciting Secret Knowledge from Language Models

Paper • 2510.01070 • Published 27 days ago • 4
Interpreting Language Models Through Concept Descriptions: A Survey

Paper • 2510.01048 • Published 27 days ago • 2

Fundational - Deep Learning

Just How Flexible are Neural Networks in Practice?

Paper • 2406.11463 • Published Jun 17, 2024 • 7
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 115
An Interactive Agent Foundation Model

Paper • 2402.05929 • Published Feb 8, 2024 • 30

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 48
RepVideo: Rethinking Cross-Layer Representation for Video Generation

Paper • 2501.08994 • Published Jan 15 • 15

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41
TimeGPT-1

Paper • 2310.03589 • Published Oct 5, 2023 • 7
A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Paper • 2405.00332 • Published May 1, 2024 • 32
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12

Models and Linearity

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 179
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 29

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 71
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10, 2024 • 29
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published Jun 6, 2024 • 30
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 41

Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 48
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct

Paper • 2405.14906 • Published May 23, 2024 • 27
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 67

Interesting things.

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1, 2024 • 14
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 625
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26, 2024 • 26
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116

🔍 Interpretability & Analysis of LMs

Outstanding research in LM interpretability and evaluation, summarized

Latent Reasoning in LLMs as a Vocabulary-Space Superposition

Paper • 2510.15522 • Published 11 days ago • 1
Language Models are Injective and Hence Invertible

Paper • 2510.15511 • Published 11 days ago • 50
Eliciting Secret Knowledge from Language Models

Paper • 2510.01070 • Published 27 days ago • 4
Interpreting Language Models Through Concept Descriptions: A Survey

Paper • 2510.01048 • Published 27 days ago • 2

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 179
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 29

Fundational - Deep Learning

Just How Flexible are Neural Networks in Practice?

Paper • 2406.11463 • Published Jun 17, 2024 • 7
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 115
An Interactive Agent Foundation Model

Paper • 2402.05929 • Published Feb 8, 2024 • 30

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 71
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10, 2024 • 29
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published Jun 6, 2024 • 30
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 41

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 48
RepVideo: Rethinking Cross-Layer Representation for Video Generation

Paper • 2501.08994 • Published Jan 15 • 15

Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 48
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct

Paper • 2405.14906 • Published May 23, 2024 • 27
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41
TimeGPT-1

Paper • 2310.03589 • Published Oct 5, 2023 • 7
A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Paper • 2405.00332 • Published May 1, 2024 • 32
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 67

Models and Linearity

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 158
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 41

Interesting things.

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1, 2024 • 14
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 625
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26, 2024 • 26
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs