hamtaai/e5-large-instruct-hadith
This is a fine-tuned version of intfloat/multilingual-e5-large-instruct specifically optimized for Persian and Arabic text processing and question-answering tasks.
Model Description
This model has been fine-tuned on a comprehensive dataset of Persian and Arabic religious texts, including:
- Persian and Arabic religious texts including Hadith collections
The model is particularly effective for:
- Semantic search in Persian and Arabic texts
- Question-answering tasks
- Information retrieval
- Cross-lingual understanding between Persian and Arabic
Training Configuration
- Base Model: intfloat/multilingual-e5-large-instruct
- Epochs: 5
- Batch Size: 72
- Learning Rate: 2e-05
- Warmup Steps Ratio: 0.1
- Evaluation Steps Ratio: 0.5
Usage
Using Sentence-Transformers
from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer('hamtaai/e5-large-instruct-hadith')
# For instruct models, use proper prefixes
query = "query: سوال شما اینجا"
passage = "passage: متن پاسخ اینجا"
# Encode texts
query_embedding = model.encode(query)
passage_embedding = model.encode(passage)
# Calculate similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(query_embedding, passage_embedding)
Using Hugging Face Transformers
from transformers import AutoTokenizer, AutoModel
import torch
tokenizer = AutoTokenizer.from_pretrained('hamtaai/e5-large-instruct-hadith')
model = AutoModel.from_pretrained('hamtaai/e5-large-instruct-hadith')
# Tokenize and encode
inputs = tokenizer("متن شما", return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)
Performance
This model has been optimized for Persian and Arabic text processing and shows improved performance on:
- Semantic similarity tasks
- Question-answering accuracy
- Cross-lingual retrieval
- Religious text understanding
Training Data
The model was trained on a curated dataset of Persian and Arabic religious texts, including:
- Hadith collections
- Quranic commentaries (Tafsir)
- Religious question-answer pairs
- Contextual information for better understanding
Limitations
- Primarily optimized for Persian and Arabic texts
- Performance may vary on other languages
- Best results achieved with proper text normalization
- Requires appropriate prefixes for instruct-based models
Citation
If you use this model, please cite the original base model and mention this fine-tuned version:
@misc{hamtaai/e5_large_instruct_hadith,
title={hamtaai/e5-large-instruct-hadith: Fine-tuned Multilingual E5 Model for Persian and Arabic Text Processing},
author={Your Name},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/hamtaai/e5-large-instruct-hadith}}
}
License
This model is released under the Apache 2.0 License.
- Downloads last month
- 2
Model tree for hamtaai/e5-large-instruct-hadith
Base model
intfloat/multilingual-e5-large-instruct