-
Distributed Representations of Sentences and Documents
Paper • 1405.4053 • Published -
Sequence to Sequence Learning with Neural Networks
Paper • 1409.3215 • Published • 3 -
PaLM: Scaling Language Modeling with Pathways
Paper • 2204.02311 • Published • 3 -
Recent Trends in Deep Learning Based Natural Language Processing
Paper • 1708.02709 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:1810.04805
-
Attention Is All You Need
Paper • 1706.03762 • Published • 102 -
Self-Attention with Relative Position Representations
Paper • 1803.02155 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 33
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
Triple-Encoders: Representations That Fire Together, Wire Together
Paper • 2402.12332 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1
-
Nemotron-4 15B Technical Report
Paper • 2402.16819 • Published • 46 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 20 -
Reformer: The Efficient Transformer
Paper • 2001.04451 • Published
-
Attention Is All You Need
Paper • 1706.03762 • Published • 102 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 7 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 17
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 17 -
OPT: Open Pre-trained Transformer Language Models
Paper • 2205.01068 • Published • 2
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Attention Is All You Need
Paper • 1706.03762 • Published • 102 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 247
-
Word Alignment by Fine-tuning Embeddings on Parallel Corpora
Paper • 2101.08231 • Published • 1 -
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
Paper • 2009.09359 • Published • 1 -
Unsupervised Multilingual Alignment using Wasserstein Barycenter
Paper • 2002.00743 • Published -
Sinhala-English Word Embedding Alignment: Introducing Datasets and Benchmark for a Low Resource Language
Paper • 2311.10436 • Published
-
Attention Is All You Need
Paper • 1706.03762 • Published • 102 -
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
Paper • 2308.00352 • Published • 2 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Paper • 1906.08237 • Published • 1
-
Distributed Representations of Sentences and Documents
Paper • 1405.4053 • Published -
Sequence to Sequence Learning with Neural Networks
Paper • 1409.3215 • Published • 3 -
PaLM: Scaling Language Modeling with Pathways
Paper • 2204.02311 • Published • 3 -
Recent Trends in Deep Learning Based Natural Language Processing
Paper • 1708.02709 • Published
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 17 -
OPT: Open Pre-trained Transformer Language Models
Paper • 2205.01068 • Published • 2
-
Attention Is All You Need
Paper • 1706.03762 • Published • 102 -
Self-Attention with Relative Position Representations
Paper • 1803.02155 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 33
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
Triple-Encoders: Representations That Fire Together, Wire Together
Paper • 2402.12332 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Attention Is All You Need
Paper • 1706.03762 • Published • 102 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 247
-
Nemotron-4 15B Technical Report
Paper • 2402.16819 • Published • 46 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 20 -
Reformer: The Efficient Transformer
Paper • 2001.04451 • Published
-
Word Alignment by Fine-tuning Embeddings on Parallel Corpora
Paper • 2101.08231 • Published • 1 -
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
Paper • 2009.09359 • Published • 1 -
Unsupervised Multilingual Alignment using Wasserstein Barycenter
Paper • 2002.00743 • Published -
Sinhala-English Word Embedding Alignment: Introducing Datasets and Benchmark for a Low Resource Language
Paper • 2311.10436 • Published
-
Attention Is All You Need
Paper • 1706.03762 • Published • 102 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 7 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 17
-
Attention Is All You Need
Paper • 1706.03762 • Published • 102 -
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
Paper • 2308.00352 • Published • 2 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Paper • 1906.08237 • Published • 1