-
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 116 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 25 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Paper • 2401.07872 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2307.03172
-
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 81 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 24 -
Zoology: Measuring and Improving Recall in Efficient Language Models
Paper • 2312.04927 • Published • 2
-
Attention Is All You Need
Paper • 1706.03762 • Published • 104 -
You Only Look Once: Unified, Real-Time Object Detection
Paper • 1506.02640 • Published • 3 -
HEp-2 Cell Image Classification with Deep Convolutional Neural Networks
Paper • 1504.02531 • Published -
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 30
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
Efficient Estimation of Word Representations in Vector Space
Paper • 1301.3781 • Published • 8 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
Attention Is All You Need
Paper • 1706.03762 • Published • 104
-
Attention Is All You Need
Paper • 1706.03762 • Published • 104 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 54 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
The Impact of Reasoning Step Length on Large Language Models
Paper • 2401.04925 • Published • 18 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
Attention Is All You Need
Paper • 1706.03762 • Published • 104
-
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 54 -
Attention Is All You Need
Paper • 1706.03762 • Published • 104 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 11 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 2 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 77 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1
-
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 116 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 25 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Paper • 2401.07872 • Published • 2
-
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 81 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 24 -
Zoology: Measuring and Improving Recall in Efficient Language Models
Paper • 2312.04927 • Published • 2
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
The Impact of Reasoning Step Length on Large Language Models
Paper • 2401.04925 • Published • 18 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
Attention Is All You Need
Paper • 1706.03762 • Published • 104
-
Attention Is All You Need
Paper • 1706.03762 • Published • 104 -
You Only Look Once: Unified, Real-Time Object Detection
Paper • 1506.02640 • Published • 3 -
HEp-2 Cell Image Classification with Deep Convolutional Neural Networks
Paper • 1504.02531 • Published -
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 30
-
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 54 -
Attention Is All You Need
Paper • 1706.03762 • Published • 104 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
Efficient Estimation of Word Representations in Vector Space
Paper • 1301.3781 • Published • 8 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
Attention Is All You Need
Paper • 1706.03762 • Published • 104
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 11 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
Attention Is All You Need
Paper • 1706.03762 • Published • 104 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 54 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 2 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 77 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1