Collections
Discover the best community collections!
Collections including paper arxiv:2404.14469
-
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Paper • 2404.11912 • Published • 17 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
An Evolved Universal Transformer Memory
Paper • 2410.13166 • Published • 6
-
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 50 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34
-
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27 -
Finch: Prompt-guided Key-Value Cache Compression
Paper • 2408.00167 • Published • 18 -
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
Paper • 2503.04973 • Published • 26 -
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression
Paper • 2406.11430 • Published • 25
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 105 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 43
-
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper • 2404.07413 • Published • 38 -
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy
Paper • 2404.05238 • Published • 3 -
Cognitive Architectures for Language Agents
Paper • 2309.02427 • Published • 8 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2
-
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
PointInfinity: Resolution-Invariant Point Diffusion Models
Paper • 2404.03566 • Published • 16 -
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Paper • 2404.08252 • Published • 6 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27
-
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27 -
Finch: Prompt-guided Key-Value Cache Compression
Paper • 2408.00167 • Published • 18 -
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
Paper • 2503.04973 • Published • 26 -
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression
Paper • 2406.11430 • Published • 25
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 105 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 43
-
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Paper • 2404.11912 • Published • 17 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
An Evolved Universal Transformer Memory
Paper • 2410.13166 • Published • 6
-
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper • 2404.07413 • Published • 38 -
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy
Paper • 2404.05238 • Published • 3 -
Cognitive Architectures for Language Agents
Paper • 2309.02427 • Published • 8 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2
-
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 50 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34
-
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
PointInfinity: Resolution-Invariant Point Diffusion Models
Paper • 2404.03566 • Published • 16 -
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Paper • 2404.08252 • Published • 6 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27