NormFormer: Improved Transformer Pretraining with Extra Normalization Paper • 2110.09456 • Published Oct 18, 2021 • 1
RoBERTa: A Robustly Optimized BERT Pretraining Approach Paper • 1907.11692 • Published Jul 26, 2019 • 9