hamtaai/e5-large-instruct-hadith

This is a fine-tuned version of intfloat/multilingual-e5-large-instruct specifically optimized for Persian and Arabic text processing and question-answering tasks.

Model Description

This model has been fine-tuned on a comprehensive dataset of Persian and Arabic religious texts, including:

Persian and Arabic religious texts including Hadith collections

The model is particularly effective for:

Semantic search in Persian and Arabic texts
Question-answering tasks
Information retrieval
Cross-lingual understanding between Persian and Arabic

Training Configuration

Base Model: intfloat/multilingual-e5-large-instruct
Epochs: 5
Batch Size: 72
Learning Rate: 2e-05
Warmup Steps Ratio: 0.1
Evaluation Steps Ratio: 0.5

Usage

Using Sentence-Transformers

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer('hamtaai/e5-large-instruct-hadith')

# For instruct models, use proper prefixes
query = "query: سوال شما اینجا"
passage = "passage: متن پاسخ اینجا"

# Encode texts
query_embedding = model.encode(query)
passage_embedding = model.encode(passage)

# Calculate similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(query_embedding, passage_embedding)

Using Hugging Face Transformers

from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained('hamtaai/e5-large-instruct-hadith')
model = AutoModel.from_pretrained('hamtaai/e5-large-instruct-hadith')

# Tokenize and encode
inputs = tokenizer("متن شما", return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state.mean(dim=1)

Performance

This model has been optimized for Persian and Arabic text processing and shows improved performance on:

Semantic similarity tasks
Question-answering accuracy
Cross-lingual retrieval
Religious text understanding

Training Data

The model was trained on a curated dataset of Persian and Arabic religious texts, including:

Hadith collections
Quranic commentaries (Tafsir)
Religious question-answer pairs
Contextual information for better understanding

Limitations

Primarily optimized for Persian and Arabic texts
Performance may vary on other languages
Best results achieved with proper text normalization
Requires appropriate prefixes for instruct-based models

Citation

If you use this model, please cite the original base model and mention this fine-tuned version:

@misc{hamtaai/e5_large_instruct_hadith,
  title={hamtaai/e5-large-instruct-hadith: Fine-tuned Multilingual E5 Model for Persian and Arabic Text Processing},
  author={Your Name},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/hamtaai/e5-large-instruct-hadith}}
}

License

This model is released under the Apache 2.0 License.

Downloads last month: 2

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for hamtaai/e5-large-instruct-hadith

Base model

intfloat/multilingual-e5-large-instruct

Finetuned

(162)

this model