DeBERTa-v3-large Zero-Shot Classification - ONNX

This is an ONNX-optimized version of MoritzLaurer/deberta-v3-large-zeroshot-v2.0 for efficient inference.

Model Description

This repository contains:

  • model.onnx: Regular ONNX exported model
  • model_quantized.onnx: INT8 dynamically quantized model for faster inference with minimal accuracy loss

The model is optimized for zero-shot classification tasks across multiple languages.

Usage

Zero-Shot Classification Pipeline (Recommended)

from transformers import pipeline, AutoTokenizer
from optimum.onnxruntime import ORTModelForSequenceClassification

# Load the quantized model
model = ORTModelForSequenceClassification.from_pretrained(
    "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
    file_name="model_quantized.onnx"
)

tokenizer = AutoTokenizer.from_pretrained(
    "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX"
)

# Patch the model's forward method to handle token_type_ids
original_forward = model.forward
def patched_forward(input_ids=None, attention_mask=None, token_type_ids=None, **kwargs):
    return original_forward(input_ids=input_ids, attention_mask=attention_mask, **kwargs)
model.forward = patched_forward

# Create zero-shot classification pipeline
classifier = pipeline(
    "zero-shot-classification",
    model=model,
    tokenizer=tokenizer,
    device=-1  # CPU inference
)

# Define your labels
labels = ["politics", "technology", "sports", "entertainment", "business"]

# Classify text
text = "Apple announced their new AI chip with impressive performance gains."
result = classifier(
    text,
    candidate_labels=labels,
    hypothesis_template="This text is about {}",
    multi_label=True  # Enable multi-label classification
)

print(f"Text: {text}")
for label, score in zip(result['labels'], result['scores']):
    print(f"  {label}: {score:.2%}")

Using Regular ONNX Model

For the non-quantized model (larger but potentially slightly more accurate):

model = ORTModelForSequenceClassification.from_pretrained(
    "richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
    file_name="model.onnx"
)
# ... rest of the code is the same

Performance

The quantized model provides:

  • Faster inference: ~2-3x speedup compared to PyTorch
  • Smaller size: Reduced model size due to INT8 quantization
  • Maintained accuracy: Minimal accuracy loss (<1%) compared to the original model

Original Model

This is an optimized version of the original model:

Optimization Details

  • Export: Converted from PyTorch to ONNX format
  • Quantization: Dynamic quantization with INT8 weights
  • Framework: ONNX Runtime with Optimum

License

Same as the base model - MIT License

Citation

If you use this model, please cite the original model:

@misc{laurer2022deberta,
  author = {Laurer, Moritz and Atteveldt, Wouter van and Casas, Andreu Salleras and Welbers, Kasper},
  title = {DeBERTa-v3-large Zero-Shot Classification},
  year = {2022},
  publisher = {Hugging Face},
  url = {https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0}
}

Acknowledgments

This ONNX optimization was created for efficient deployment in production environments. Special thanks to the original model authors and the Hugging Face Optimum team.

Downloads last month
59
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX

Quantized
(2)
this model