hasankursun's picture
Upload folder using huggingface_hub
da0ab8a verified
Creative Commons Attribution-NonCommercial 4.0 International License
Copyright (c) 2025
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0
International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to
Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
You are free to:
- Share β€” copy and redistribute the material in any medium or format
- Adapt β€” remix, transform, and build upon the material
Under the following terms:
- Attribution β€” You must give appropriate credit, provide a link to the license,
and indicate if changes were made. You may do so in any reasonable manner, but
not in any way that suggests the licensor endorses you or your use.
- NonCommercial β€” You may not use the material for commercial purposes.
- No additional restrictions β€” You may not apply legal terms or technological
measures that legally restrict others from doing anything the license permits.
---
## Acknowledgments
This model builds upon several foundational works and contributions:
### Base Architecture
- **XLM-RoBERTa**: This model uses XLM-RoBERTa as its base architecture
- Original paper: "Unsupervised Cross-lingual Representation Learning at Scale"
- Authors: Conneau et al.
- License: MIT License
### Training Methodology
We are grateful to the Beijing Academy of Artificial Intelligence (BAAI) for their
contributions to embedding research:
- **RetroMAE**: Self-supervised pre-training methodology
- Paper: "RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder"
- Authors: BAAI
- arXiv: https://arxiv.org/abs/2205.12035
- **BGE-M3**: Multi-lingual embedding research
- Paper: "BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings"
- Authors: BAAI
- arXiv: https://arxiv.org/abs/2402.03216
### Matryoshka Representation Learning
- Paper: "Matryoshka Representation Learning"
- Authors: Kusupati et al.
- Year: 2022
- arXiv: https://arxiv.org/abs/2205.13147
### Training Framework
- Sentence Transformers: https://www.sbert.net
- License: Apache 2.0
Users are encouraged to cite this model and the foundational works when using
it in research or applications.