File size: 4,913 Bytes

---
library_name: onnx
tags:
- text-reranking
- jina
- onnx
- fp16
pipeline_tag: sentence-similarity
base_model:
- jinaai/jina-reranker-m0
---

# Jina Reranker M0 - ONNX FP16 Version

This repository contains the [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0) model converted to the ONNX format with FP16 precision.

## Model Description

Jina Reranker is designed to rerank search results or document passages based on their relevance to a given query. It takes a query and a list of documents as input and outputs relevance scores.

This version is specifically exported for use with ONNX Runtime.

**Original Model Card:** [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0)

## Technical Details

*   **Format:** ONNX
*   **Opset:** 14
*   **Precision:** FP16 (exported using `.half()`)
*   **External Data:** Uses ONNX external data format due to model size. All files in this repository are required. `huggingface_hub` handles downloading them automatically.
*   **Export Source:** Exported from the Hugging Face `transformers` library using `torch.onnx.export`.

## Usage

You can use this model with `onnxruntime` for inference. You will also need the `transformers` library to load the appropriate processor for input preparation and `huggingface_hub` to download the model files.

**1. Installation:**

```bash
pip install onnxruntime huggingface_hub transformers torch sentencepiece
```

**2. Inference Script:**

```python
import onnxruntime as ort
from huggingface_hub import hf_hub_download
from transformers import AutoProcessor
import numpy as np
import torch # For processor output handling

# --- Configuration ---
# Replace with your repository ID if different
repo_id = "jian-mo/jina-reranker-m0-onnx"
onnx_filename = "jina-reranker-m0.onnx" # Main ONNX file name
# Use the original model ID to load the correct processor
original_model_id = "jinaai/jina-reranker-m0"
# --- End Configuration ---

# 1. Download ONNX model files from the Hub
# hf_hub_download automatically handles external data files linked via LFS
print(f"Downloading ONNX model from {repo_id}...")
local_onnx_path = hf_hub_download(
    repo_id=repo_id,
    filename=onnx_filename
)
print(f"ONNX model downloaded to: {local_onnx_path}")

# 2. Load ONNX Runtime session
print("Loading ONNX Inference Session...")
# You can choose execution providers, e.g., ['CUDAExecutionProvider', 'CPUExecutionProvider']
# if you have GPU support and the necessary onnxruntime build.
session_options = ort.SessionOptions()
# session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED
providers = ['CPUExecutionProvider'] # Default to CPU
session = ort.InferenceSession(local_onnx_path, sess_options=session_options, providers=providers)
print(f"ONNX session loaded with provider: {session.get_providers()}")

# 3. Load the Processor
print(f"Loading processor from {original_model_id}...")
processor = AutoProcessor.from_pretrained(original_model_id, trust_remote_code=True)
print("Processor loaded.")

# 4. Prepare Input Data
query = "What is deep learning?"
document = "Deep learning is a subset of machine learning based on artificial neural networks with representation learning."
# Example with multiple documents (batch processing)
# documents = [
#     "Deep learning is a subset of machine learning based on artificial neural networks with representation learning.",
#     "Artificial intelligence refers to the simulation of human intelligence in machines.",
#     "A transformer is a deep learning model used primarily in the field of natural language processing."
# ]
# Use processor logic suitable for query + multiple documents if needed

print("Preparing input data...")
# Process query and document together as expected by the reranker model
inputs = processor(
    text=f"{query} {document}",
    images=None, # Assuming text-only reranking
    return_tensors="pt", # Get PyTorch tensors first
    padding=True,
    truncation=True,
    max_length=512 # Use a reasonable max_length
)

# Convert to NumPy for ONNX Runtime
inputs_np = {
    "input_ids": inputs["input_ids"].numpy(),
    "attention_mask": inputs["attention_mask"].numpy()
}
print("Input data prepared.")
# print("Input shapes:", {k: v.shape for k, v in inputs_np.items()})

# 5. Run Inference
print("Running inference...")
output_names = [output.name for output in session.get_outputs()]
outputs = session.run(output_names, inputs_np)
print("Inference complete.")

# 6. Process Output
# The exact interpretation depends on the model's output structure.
# For Jina Reranker, the output is typically a logit score.
# Higher values usually indicate higher relevance. Check the original model card.
print(f"Number of outputs: {len(outputs)}")
if len(outputs) > 0:
    logits = outputs[0]
    print(f"Output logits shape: {logits.shape}")
    # Often, the relevance score is associated