-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 108 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 25 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper β’ 1907.11692 β’ Published β’ 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper β’ 1910.01108 β’ Published β’ 21
Taufiq Dwi Purnomo
taufiqdp
AI & ML interests
SLM, VLM
Recent Activity
upvoted
a
paper
5 days ago
mHC: Manifold-Constrained Hyper-Connections
liked
a model
5 days ago
MiniMaxAI/MiniMax-M2.1
upvoted
a
paper
9 days ago
Qwen3-VL Technical Report