DeBERTa-v3-large Zero-Shot Classification - ONNX
This is an ONNX-optimized version of MoritzLaurer/deberta-v3-large-zeroshot-v2.0 for efficient inference.
Model Description
This repository contains:
- model.onnx: Regular ONNX exported model
- model_quantized.onnx: INT8 dynamically quantized model for faster inference with minimal accuracy loss
The model is optimized for zero-shot classification tasks across multiple languages.
Usage
Zero-Shot Classification Pipeline (Recommended)
from transformers import pipeline, AutoTokenizer
from optimum.onnxruntime import ORTModelForSequenceClassification
# Load the quantized model
model = ORTModelForSequenceClassification.from_pretrained(
"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
file_name="model_quantized.onnx"
)
tokenizer = AutoTokenizer.from_pretrained(
"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX"
)
# Patch the model's forward method to handle token_type_ids
original_forward = model.forward
def patched_forward(input_ids=None, attention_mask=None, token_type_ids=None, **kwargs):
return original_forward(input_ids=input_ids, attention_mask=attention_mask, **kwargs)
model.forward = patched_forward
# Create zero-shot classification pipeline
classifier = pipeline(
"zero-shot-classification",
model=model,
tokenizer=tokenizer,
device=-1 # CPU inference
)
# Define your labels
labels = ["politics", "technology", "sports", "entertainment", "business"]
# Classify text
text = "Apple announced their new AI chip with impressive performance gains."
result = classifier(
text,
candidate_labels=labels,
hypothesis_template="This text is about {}",
multi_label=True # Enable multi-label classification
)
print(f"Text: {text}")
for label, score in zip(result['labels'], result['scores']):
print(f" {label}: {score:.2%}")
Using Regular ONNX Model
For the non-quantized model (larger but potentially slightly more accurate):
model = ORTModelForSequenceClassification.from_pretrained(
"richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX",
file_name="model.onnx"
)
# ... rest of the code is the same
Performance
The quantized model provides:
- Faster inference: ~2-3x speedup compared to PyTorch
- Smaller size: Reduced model size due to INT8 quantization
- Maintained accuracy: Minimal accuracy loss (<1%) compared to the original model
Original Model
This is an optimized version of the original model:
- Base Model: MoritzLaurer/deberta-v3-large-zeroshot-v2.0
- Architecture: DeBERTa-v3-large
- Task: Zero-shot classification / NLI
Optimization Details
- Export: Converted from PyTorch to ONNX format
- Quantization: Dynamic quantization with INT8 weights
- Framework: ONNX Runtime with Optimum
License
Same as the base model - MIT License
Citation
If you use this model, please cite the original model:
@misc{laurer2022deberta,
author = {Laurer, Moritz and Atteveldt, Wouter van and Casas, Andreu Salleras and Welbers, Kasper},
title = {DeBERTa-v3-large Zero-Shot Classification},
year = {2022},
publisher = {Hugging Face},
url = {https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0}
}
Acknowledgments
This ONNX optimization was created for efficient deployment in production environments. Special thanks to the original model authors and the Hugging Face Optimum team.
- Downloads last month
- 59
Model tree for richardr1126/deberta-v3-large-zeroshot-v2.0-ONNX
Base model
microsoft/deberta-v3-large