MiniLM-L12-v2 Embed base Legal Public Deed - Matryoshka (ONNX Quantized)

This is an ONNX quantized version of the MiniLM-L12-v2 Embed base Legal Public Deed - Matryoshka model. It has been optimized for faster inference and a smaller file size, while maintaining comparable performance to the original model.

This version is recommended for production deployments and applications where latency and memory footprint are critical, especially when running inference on CPUs or resource-constrained devices.

Model Details

Quantization Architecture

  • Quantization Type: Int8 (8-bit quantization).
  • Model Size: Significantly reduced compared to the original float32 model.

Model Description

Model Sources

Usage

To use this quantized ONNX model with the sentence-transformers library, make sure you have the necessary dependencies installed:

pip install -U sentence-transformers optimum onnx onnxruntime
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for luiggy2620/modernbert-onnx

Evaluation results