The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared Task (MixMT) Paper • 2210.11309 • Published Oct 20, 2022
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies Paper • 2503.10267 • Published Mar 13 • 2
Tokenizer Choice For LLM Training: Negligible or Crucial? Paper • 2310.08754 • Published Oct 12, 2023 • 3
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings Paper • 2202.06671 • Published Feb 14, 2022 • 2
Specialized Document Embeddings for Aspect-based Similarity of Research Papers Paper • 2203.14541 • Published Mar 28, 2022
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning Paper • 2301.09626 • Published Jan 23, 2023 • 2