|
|
Creative Commons Attribution-NonCommercial 4.0 International License
|
|
|
|
|
|
Copyright (c) 2025
|
|
|
|
|
|
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0
|
|
|
International License. To view a copy of this license, visit
|
|
|
http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to
|
|
|
Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
|
|
|
|
|
|
You are free to:
|
|
|
- Share β copy and redistribute the material in any medium or format
|
|
|
- Adapt β remix, transform, and build upon the material
|
|
|
|
|
|
Under the following terms:
|
|
|
- Attribution β You must give appropriate credit, provide a link to the license,
|
|
|
and indicate if changes were made. You may do so in any reasonable manner, but
|
|
|
not in any way that suggests the licensor endorses you or your use.
|
|
|
- NonCommercial β You may not use the material for commercial purposes.
|
|
|
- No additional restrictions β You may not apply legal terms or technological
|
|
|
measures that legally restrict others from doing anything the license permits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This model builds upon several foundational works and contributions:
|
|
|
|
|
|
|
|
|
- **XLM-RoBERTa**: This model uses XLM-RoBERTa as its base architecture
|
|
|
- Original paper: "Unsupervised Cross-lingual Representation Learning at Scale"
|
|
|
- Authors: Conneau et al.
|
|
|
- License: MIT License
|
|
|
|
|
|
|
|
|
We are grateful to the Beijing Academy of Artificial Intelligence (BAAI) for their
|
|
|
contributions to embedding research:
|
|
|
|
|
|
- **RetroMAE**: Self-supervised pre-training methodology
|
|
|
- Paper: "RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder"
|
|
|
- Authors: BAAI
|
|
|
- arXiv: https://arxiv.org/abs/2205.12035
|
|
|
|
|
|
- **BGE-M3**: Multi-lingual embedding research
|
|
|
- Paper: "BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings"
|
|
|
- Authors: BAAI
|
|
|
- arXiv: https://arxiv.org/abs/2402.03216
|
|
|
|
|
|
|
|
|
- Paper: "Matryoshka Representation Learning"
|
|
|
- Authors: Kusupati et al.
|
|
|
- Year: 2022
|
|
|
- arXiv: https://arxiv.org/abs/2205.13147
|
|
|
|
|
|
|
|
|
- Sentence Transformers: https://www.sbert.net
|
|
|
- License: Apache 2.0
|
|
|
|
|
|
Users are encouraged to cite this model and the foundational works when using
|
|
|
it in research or applications.
|
|
|
|