Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.14469

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27

Papers - Command-R

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29, 2024 • 71

Papers - KV Cache

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Paper • 2404.11912 • Published Apr 18, 2024 • 17
SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 260
An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 6

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Paper • 2404.02575 • Published Apr 3, 2024 • 50
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55
SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27
FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17, 2024 • 34

KV CACHE Compression

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27
Finch: Prompt-guided Key-Value Cache Compression

Paper • 2408.00167 • Published Jul 31, 2024 • 18
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning

Paper • 2503.04973 • Published Mar 6 • 26
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression

Paper • 2406.11430 • Published Jun 17, 2024 • 25

Papers - Cohere

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29, 2024 • 71

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 625
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 105
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14, 2024 • 43

Papers - University - Princeton University

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy

Paper • 2404.05238 • Published Apr 8, 2024 • 3
Cognitive Architectures for Language Agents

Paper • 2309.02427 • Published Sep 5, 2023 • 8
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2

Papers - University - University of Illinois

Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2, 2024 • 46
PointInfinity: Resolution-Invariant Point Diffusion Models

Paper • 2404.03566 • Published Apr 4, 2024 • 16
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

Paper • 2404.08252 • Published Apr 12, 2024 • 6
SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27

KV CACHE Compression

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27
Finch: Prompt-guided Key-Value Cache Compression

Paper • 2408.00167 • Published Jul 31, 2024 • 18
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning

Paper • 2503.04973 • Published Mar 6 • 26
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression

Paper • 2406.11430 • Published Jun 17, 2024 • 25

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27

Papers - Cohere

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29, 2024 • 71

Papers - Command-R

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29, 2024 • 71

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 625
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 105
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14, 2024 • 43

Papers - KV Cache

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Paper • 2404.11912 • Published Apr 18, 2024 • 17
SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 260
An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 6

Papers - University - Princeton University

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy

Paper • 2404.05238 • Published Apr 8, 2024 • 3
Cognitive Architectures for Language Agents

Paper • 2309.02427 • Published Sep 5, 2023 • 8
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Paper • 2404.02575 • Published Apr 3, 2024 • 50
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55
SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27
FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17, 2024 • 34

Papers - University - University of Illinois

Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2, 2024 • 46
PointInfinity: Resolution-Invariant Point Diffusion Models

Paper • 2404.03566 • Published Apr 4, 2024 • 16
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

Paper • 2404.08252 • Published Apr 12, 2024 • 6
SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs