-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠91 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ⢠2307.08691 ⢠Published ⢠9 -
Mixtral of Experts
Paper ⢠2401.04088 ⢠Published ⢠159 -
Mistral 7B
Paper ⢠2310.06825 ⢠Published ⢠55
Collections
Discover the best community collections!
Collections including paper arxiv:2311.11045
-
Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca
Paper ⢠2309.08958 ⢠Published ⢠2 -
Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering
Paper ⢠2309.06358 ⢠Published ⢠1 -
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper ⢠2310.13385 ⢠Published ⢠10 -
Retrieval-Generation Synergy Augmented Large Language Models
Paper ⢠2310.05149 ⢠Published ⢠1
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper ⢠2309.03883 ⢠Published ⢠35 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper ⢠2106.09685 ⢠Published ⢠52 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper ⢠2309.07870 ⢠Published ⢠42 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper ⢠2309.00267 ⢠Published ⢠51
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper ⢠2310.13961 ⢠Published ⢠5 -
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper ⢠2310.13385 ⢠Published ⢠10 -
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper ⢠2310.13127 ⢠Published ⢠12 -
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Paper ⢠2310.00492 ⢠Published ⢠2
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper ⢠2310.11453 ⢠Published ⢠105 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper ⢠2310.11511 ⢠Published ⢠78 -
In-Context Learning Creates Task Vectors
Paper ⢠2310.15916 ⢠Published ⢠43 -
Matryoshka Diffusion Models
Paper ⢠2310.15111 ⢠Published ⢠43
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠91 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ⢠2307.08691 ⢠Published ⢠9 -
Mixtral of Experts
Paper ⢠2401.04088 ⢠Published ⢠159 -
Mistral 7B
Paper ⢠2310.06825 ⢠Published ⢠55
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper ⢠2310.13961 ⢠Published ⢠5 -
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper ⢠2310.13385 ⢠Published ⢠10 -
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper ⢠2310.13127 ⢠Published ⢠12 -
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Paper ⢠2310.00492 ⢠Published ⢠2
-
Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca
Paper ⢠2309.08958 ⢠Published ⢠2 -
Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering
Paper ⢠2309.06358 ⢠Published ⢠1 -
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper ⢠2310.13385 ⢠Published ⢠10 -
Retrieval-Generation Synergy Augmented Large Language Models
Paper ⢠2310.05149 ⢠Published ⢠1
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper ⢠2310.11453 ⢠Published ⢠105 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper ⢠2310.11511 ⢠Published ⢠78 -
In-Context Learning Creates Task Vectors
Paper ⢠2310.15916 ⢠Published ⢠43 -
Matryoshka Diffusion Models
Paper ⢠2310.15111 ⢠Published ⢠43
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper ⢠2309.03883 ⢠Published ⢠35 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper ⢠2106.09685 ⢠Published ⢠52 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper ⢠2309.07870 ⢠Published ⢠42 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper ⢠2309.00267 ⢠Published ⢠51