DeAR-Reranking
Collection
DeAR (Deep Agent Rank): Dual-Stage Document Reranking with Reasoning Agents Accepted at EMNLP Findings 2025
β’
12 items
β’
Updated
β’
1
DeAR-8B-Reranker-CE-LoRA-v1 is a LoRA (Low-Rank Adaptation) adapter for neural reranking trained with Binary Cross-Entropy loss. This lightweight adapter requires only ~100MB of storage and can be applied to LLaMA-3.1-8B to achieve near full-model performance with minimal overhead.
β
Ultra Lightweight: Only 100MB storage
β
Efficient: 3x faster training than full fine-tuning
β
High Performance: 98% of full model accuracy
β
Easy Integration: Simple adapter loading
β
Classification-based: Binary relevance prediction
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel, PeftConfig
# Load LoRA adapter
adapter_path = "abdoelsayed/dear-8b-reranker-ce-lora-v1"
config = PeftConfig.from_pretrained(adapter_path)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
config.base_model_name_or_path,
num_labels=1,
torch_dtype=torch.bfloat16
)
# Load and merge LoRA
model = PeftModel.from_pretrained(base_model, adapter_path)
model = model.merge_and_unload()
model.eval().cuda()
# Score query-document pair
query = "What is machine learning?"
document = "Machine learning is a subset of artificial intelligence..."
inputs = tokenizer(
f"query: {query}",
f"document: {document}",
return_tensors="pt",
truncation=True,
max_length=228,
padding="max_length"
)
inputs = {k: v.cuda() for k, v in inputs.items()}
with torch.no_grad():
score = model(**inputs).logits.squeeze().item()
print(f"Relevance score: {score}")
@torch.inference_mode()
def rerank(tokenizer, model, query: str, documents, batch_size=64):
scores = []
device = next(model.parameters()).device
for i in range(0, len(documents), batch_size):
batch = documents[i:i + batch_size]
queries = [f"query: {query}"] * len(batch)
docs = [f"document: {title} {text}" for title, text in batch]
inputs = tokenizer(queries, docs, return_tensors="pt",
truncation=True, max_length=228, padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}
logits = model(**inputs).logits.squeeze(-1)
scores.extend(logits.cpu().tolist())
return sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
{
"r": 16,
"lora_alpha": 32,
"target_modules": ["q_proj", "v_proj", "k_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
"lora_dropout": 0.05,
"bias": "none",
"task_type": "SEQ_CLS"
}
| Feature | LoRA | Full Model |
|---|---|---|
| Storage | 100MB | 16GB |
| Training Time | 12h | 34h |
| Performance | 98% | 100% |
| Memory | 28GB | 38GB |
@article{abdallah2025dear,
title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
journal={arXiv preprint arXiv:2508.16998},
year={2025}
}
MIT License
Base model
meta-llama/Llama-3.1-8B