-
Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training
Paper • 2502.03460 • Published -
LLM-Pruner: On the Structural Pruning of Large Language Models
Paper • 2305.11627 • Published • 3 -
Pruning as a Domain-specific LLM Extractor
Paper • 2405.06275 • Published • 1 -
KnowTuning: Knowledge-aware Fine-tuning for Large Language Models
Paper • 2402.11176 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:1301.3781
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
Efficient Estimation of Word Representations in Vector Space
Paper • 1301.3781 • Published • 8 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Attention Is All You Need
Paper • 1706.03762 • Published • 102
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 148 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 18 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 4 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 151
-
Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training
Paper • 2502.03460 • Published -
LLM-Pruner: On the Structural Pruning of Large Language Models
Paper • 2305.11627 • Published • 3 -
Pruning as a Domain-specific LLM Extractor
Paper • 2405.06275 • Published • 1 -
KnowTuning: Knowledge-aware Fine-tuning for Large Language Models
Paper • 2402.11176 • Published • 2
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 148 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 18 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 4 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 151
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
Efficient Estimation of Word Representations in Vector Space
Paper • 1301.3781 • Published • 8 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Attention Is All You Need
Paper • 1706.03762 • Published • 102