# Enhanced-BGE-M3-with-CLP-and-MoE ([paper](https://arxiv.org/abs/2412.17364), [code](https://github.com/CreaLabs/Enhanced-BGE-M3-with-CLP-and-MoE)) ## Contrastive Learning Penalty (CLP) CLP is a novel loss function designed to address the limitations of existing contrastive learning methods for improved performance in information retrieval tasks. It incorporates a penalty term that encourages the model to learn more discriminative representations by considering the similarity between negative samples and their corresponding queries. The CLP loss function is defined as follows: where: * hi: The embedding of the query for the i-th instance. * hi+: The embedding of the positive sample for the i-th instance. * H': The set of negative samples for the i-th instance. * h': The embedding of the negative sample's query. * H*: the set of positive queries for the documents corresponding to the negative samples * sim(a, b): The cosine similarity function between embeddings a and b. * τ: The temperature parameter. * λ: The balancing parameter between the contrastive loss and the penalty term. The difference between Contrastive Learning Loss and Contrastive Learning Penalty Loss: ## Specs - Model | Model Name | Introduction | |---|---| | [bge-m3-ko-CLPL-interMoE](https://huggingface.co/CreaLabs/bge-m3-ko-CLPL-interMoE) | This model applies CLPL and MoE, trained on the MIRACL Korean training dataset. MoE is applied to the intermediate layer, and only the MoE layers were trained during fine-tuning. | | [bge-m3-fa-CLPL-interMoE](https://huggingface.co/CreaLabs/bge-m3-fa-CLPL-interMoE) | This model applies CLPL and MoE, trained on the MIRACL Persian training dataset. MoE is applied to the intermediate layer, and only the MoE layers were trained during fine-tuning. | | [bge-m3-hi-CLPL-interMoE](https://huggingface.co/CreaLabs/bge-m3-hi-CLPL-interMoE) | This model applies CLPL and MoE, trained on the MIRACL Hindi training dataset. MoE is applied to the intermediate layer, and only the MoE layers were trained during fine-tuning. | - Data Performing negative sampling using the ANCE methodology and generating negative sample's positive queries through the Gemini 1.5 Pro model, which are required for CLPL. | Dataset | Introduction | |---|---| | [ko_CLPL_train_data](https://github.com/Dream-Forge-Studios/Enhanced-BGE-M3-with-CLPL-and-MoE/blob/main/data/ko_CLPL_train_data.jsonl) | MIRACL Korean CLPL training dataset | | [fa_CLPL_train_data](https://github.com/Dream-Forge-Studios/Enhanced-BGE-M3-with-CLPL-and-MoE/blob/main/data/fa_CLPL_train_data.jsonl) | MIRACL Persian CLPL training dataset | | [hi_CLPL_train_data](https://github.com/Dream-Forge-Studios/Enhanced-BGE-M3-with-CLPL-and-MoE/blob/main/data/hi_CLPL_train_data.jsonl) | MIRACL Hindi CLPL training dataset | ## Evaluation ## Citation @misc{yu2024efficientfinetuningmethodologytext, title={Efficient fine-tuning methodology of text embedding models for information retrieval: contrastive learning penalty (clp)}, author={Jeongsu Yu}, year={2024}, eprint={2412.17364}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2412.17364}, }