|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- newmindai/RAGTruth-TR |
|
|
language: |
|
|
- tr |
|
|
- en |
|
|
metrics: |
|
|
- precision |
|
|
- recall |
|
|
- f1 |
|
|
- roc_auc |
|
|
base_model: |
|
|
- EuroBERT/EuroBERT-210m |
|
|
pipeline_tag: token-classification |
|
|
--- |
|
|
|
|
|
# lettucedect-210m-eurobert-tr-v1 |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**lettucedct-210m-eurobert-tr-v1** is a multilingual hallucination detection model based on the EuroBERT architecture, fine-tuned for Turkish hallucination detection tasks. This model is part of the Turk-LettuceDetect suite and demonstrates strong cross-lingual generalization capabilities for detecting hallucinations in Turkish Retrieval-Augmented Generation (RAG) applications. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type:** Token-level binary classifier for hallucination detection |
|
|
- **Base Architecture:** EuroBERT-base |
|
|
- **Language:** Turkish (tr) with multilingual capabilities |
|
|
- **Training Dataset:** Machine-translated RAGTruth dataset (17,790 training instances) |
|
|
- **Context Length:** Up to 8,192 tokens |
|
|
- **Model Size:** ~210M parameters |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
### Primary Use Cases |
|
|
- Hallucination detection in Turkish RAG systems |
|
|
- Cross-lingual hallucination detection applications |
|
|
- Data-to-text generation verification (strongest performance area) |
|
|
- Multilingual NLP pipelines requiring Turkish support |
|
|
|
|
|
### Supported Tasks |
|
|
- Question Answering (QA) hallucination detection |
|
|
- Data-to-text generation verification (**strongest performance**) |
|
|
- Text summarization fact-checking |
|
|
|
|
|
## Performance |
|
|
|
|
|
### Overall Performance (F1-Score) |
|
|
- **Whole Dataset:** 0.7777 |
|
|
- **Question Answering:** 0.7317 |
|
|
- **Data-to-text Generation:** 0.8030 (**best in suite**) |
|
|
- **Summarization:** 0.6057 |
|
|
|
|
|
### Key Strengths |
|
|
- **Best performance in data-to-text generation** |
|
|
- Robust multilingual transfer learning capabilities |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
- **Dataset:** Machine-translated RAGTruth benchmark |
|
|
- **Size:** 17,790 training instances, 2,700 test instances |
|
|
- **Tasks:** Question answering (MS MARCO), data-to-text (Yelp), summarization (CNN/Daily Mail) |
|
|
- **Translation Model:** Google Gemma-3-27b-it |
|
|
|
|
|
### Training Configuration |
|
|
- **Epochs:** 6 |
|
|
- **Learning Rate:** 1e-5 |
|
|
- **Batch Size:** 4 |
|
|
- **Hardware:** NVIDIA A100 40GB GPU |
|
|
- **Training Time:** ~2 hours |
|
|
- **Optimization:** Cross-entropy loss with token masking |
|
|
|
|
|
### Multilingual Foundation |
|
|
- Built on EuroBERT architecture supporting multiple European languages |
|
|
- Demonstrates effective multilingual transfer learning |
|
|
- No full in-language retraining required due to strong cross-lingual capabilities |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Architecture Features |
|
|
- **Base Model:** EuroBERT multilingual encoder |
|
|
- **Maximum Sequence Length:** 8,192 tokens |
|
|
- **Classification Head:** Binary token-level classifier |
|
|
- **Multilingual Support:** European languages with strong Turkish adaptation |
|
|
- **Parameter Count:** 210M parameters |
|
|
|
|
|
### Input Format |
|
|
``` |
|
|
Input: [CONTEXT] [QUESTION] [GENERATED_ANSWER] |
|
|
Output: Token-level binary labels (0=supported, 1=hallucinated) |
|
|
``` |
|
|
|
|
|
## Limitations and Biases |
|
|
|
|
|
### Known Limitations |
|
|
- Reduced effectiveness in summarization compared to structured tasks |
|
|
- Performance dependent on translation quality of training data |
|
|
- Optimized primarily for European language patterns |
|
|
|
|
|
### Potential Biases |
|
|
- Translation artifacts from machine-translated training data |
|
|
- Multilingual transfer bias favoring European linguistic patterns |
|
|
- May perform differently on Turkish dialects or informal text |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
```bash |
|
|
pip install lettucedetect |
|
|
``` |
|
|
|
|
|
### Basic Usage |
|
|
```python |
|
|
from lettucedetect.models.inference import HallucinationDetector |
|
|
|
|
|
# Initialize the Turkish-specific hallucination detector |
|
|
detector = HallucinationDetector( |
|
|
method="transformer", |
|
|
model_path="newmindai/modernbert-tr-uncased-stsb-HD" |
|
|
) |
|
|
|
|
|
# Turkish context, question, and answer |
|
|
context = "İstanbul Türkiye'nin en büyük şehridir. Şehir 15 milyonluk nüfusla Avrupa'nın en kalabalık şehridir." |
|
|
question = "İstanbul'un nüfusu nedir? İstanbul Avrupa'nın en kalabalık şehri midir?" |
|
|
answer = "İstanbul'un nüfusu yaklaşık 16 milyondur ve Avrupa'nın en kalabalık şehridir." |
|
|
|
|
|
# Get span-level predictions (start/end indices, confidence scores) |
|
|
predictions = detector.predict( |
|
|
context=context, |
|
|
question=question, |
|
|
answer=answer, |
|
|
output_format="spans" |
|
|
) |
|
|
|
|
|
print("Tespit Edilen Hallusinasyonlar:", predictions) |
|
|
# Örnek çıktı: |
|
|
# [{'start': 34, 'end': 57, 'confidence': 0.92, 'text': 'yaklaşık 16 milyondur'}] |
|
|
``` |
|
|
|
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Benchmark Results |
|
|
Evaluated on machine-translated Turkish RAGTruth test set, demonstrating the effectiveness of multilingual transfer learning for Turkish hallucination detection, particularly excelling in data-to-text generation tasks. |
|
|
|
|
|
**Example-level Results** |
|
|
|
|
|
<img |
|
|
src="https://cdn-uploads.huggingface.co/production/uploads/683d4880e639f8d647355997/RejTWu3JNjH8t0teV1Txf.png" |
|
|
width="1000" |
|
|
style="object-fit: contain; margin: auto; display: block;" |
|
|
/> |
|
|
|
|
|
**Token-level Results** |
|
|
|
|
|
<img |
|
|
src="https://cdn-uploads.huggingface.co/production/uploads/683d4880e639f8d647355997/ECyrfN5Jv8fZSM0svxLXq.png" |
|
|
width="500" |
|
|
style="object-fit: contain; margin: auto; display: block;" |
|
|
/> |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{turklettucedetect2025, |
|
|
title={Turk-LettuceDetect: A Hallucination Detection Models for Turkish RAG Applications}, |
|
|
author={NewMind AI Team}, |
|
|
booktitle={9th International Artificial Intelligence and Data Processing Symposium (IDAP'25)}, |
|
|
year={2025}, |
|
|
address={Malatya, Turkey} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Original LettuceDetect Framework |
|
|
|
|
|
This model extends the LettuceDetect methodology: |
|
|
```bibtex |
|
|
@article{lettucedetect2025, |
|
|
title={LettuceDetect: a hallucination detection framework for RAG applications}, |
|
|
author={Kovács, Á. and Ács, B. and Kovács, D. and Szendi, S. and Kadlecik, Z. and Dávid, S.}, |
|
|
journal={arXiv preprint arXiv:2502.17125}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under an open-source license to support research and development in Turkish and multilingual NLP applications. |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions about this model or other Turkish hallucination detection models, please refer to the original paper or contact the authors. |
|
|
|
|
|
--- |