hamtaai/e5-large-instruct-hadith

This is a fine-tuned version of intfloat/multilingual-e5-large-instruct specifically optimized for Persian and Arabic text processing and question-answering tasks.

Model Description

This model has been fine-tuned on a comprehensive dataset of Persian and Arabic religious texts, including:

  • Persian and Arabic religious texts including Hadith collections

The model is particularly effective for:

  • Semantic search in Persian and Arabic texts
  • Question-answering tasks
  • Information retrieval
  • Cross-lingual understanding between Persian and Arabic

Training Configuration

  • Base Model: intfloat/multilingual-e5-large-instruct
  • Epochs: 5
  • Batch Size: 72
  • Learning Rate: 2e-05
  • Warmup Steps Ratio: 0.1
  • Evaluation Steps Ratio: 0.5

Usage

Using Sentence-Transformers

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer('hamtaai/e5-large-instruct-hadith')

# For instruct models, use proper prefixes
query = "query: سوال شما اینجا"
passage = "passage: متن پاسخ اینجا"

# Encode texts
query_embedding = model.encode(query)
passage_embedding = model.encode(passage)

# Calculate similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(query_embedding, passage_embedding)

Using Hugging Face Transformers

from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained('hamtaai/e5-large-instruct-hadith')
model = AutoModel.from_pretrained('hamtaai/e5-large-instruct-hadith')

# Tokenize and encode
inputs = tokenizer("متن شما", return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state.mean(dim=1)

Performance

This model has been optimized for Persian and Arabic text processing and shows improved performance on:

  • Semantic similarity tasks
  • Question-answering accuracy
  • Cross-lingual retrieval
  • Religious text understanding

Training Data

The model was trained on a curated dataset of Persian and Arabic religious texts, including:

  • Hadith collections
  • Quranic commentaries (Tafsir)
  • Religious question-answer pairs
  • Contextual information for better understanding

Limitations

  • Primarily optimized for Persian and Arabic texts
  • Performance may vary on other languages
  • Best results achieved with proper text normalization
  • Requires appropriate prefixes for instruct-based models

Citation

If you use this model, please cite the original base model and mention this fine-tuned version:

@misc{hamtaai/e5_large_instruct_hadith,
  title={hamtaai/e5-large-instruct-hadith: Fine-tuned Multilingual E5 Model for Persian and Arabic Text Processing},
  author={Your Name},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/hamtaai/e5-large-instruct-hadith}}
}

License

This model is released under the Apache 2.0 License.

Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hamtaai/e5-large-instruct-hadith

Finetuned
(162)
this model