Einsum fails on Triton-ONNX-Runtime

by pemoi1982datapelago - opened Feb 11

pemoi1982datapelago

Feb 11

When exporting this model into ONNX and serving it on NVIDIA Triton with ONNX-Runtime backend I get the following error:

2025-02-11 15:44:11.410228203 [E:onnxruntime:log, cuda_call.cc:123 CudaCall] CUBLAS failure 7: CUBLAS_STATUS_INVALID_VALUE ; GPU=0 ; hostname=9b6c766f16ea ; file=/workspace/onnxruntime/onnxruntime/core/providers/cuda/math/einsum_utils/einsum_auxiliary_ops.cc ; line=54 ; expr=cublasGemmStridedBatchedHelper( static_cast<EinsumCudaAssets*>(einsum_cuda_assets)->cublas_handle_, CUBLAS_OP_N, CUBLAS_OP_N, static_cast(N), static_cast(M), static_cast(K), &one, reinterpret_cast<const CudaT*>(input_2_data), static_cast(N), static_cast(right_stride), reinterpret_cast<const CudaT*>(input_1_data), static_cast(K), static_cast(left_stride), &zero, reinterpret_cast<CudaT*>(output_data), static_cast(N), static_cast(output_stride), static_cast(num_batches), static_cast<EinsumCudaAssets*>(einsum_cuda_assets)->cuda_ep_->GetDeviceProp(), static_cast<EinsumCudaAssets*>(einsum_cuda_assets)->cuda_ep_->UseTF32());
2025-02-11 15:44:11.432230090 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Einsum node. Name:'/Einsum' Status Message: /workspace/onnxruntime/onnxruntime/core/providers/cpu/math/einsum_utils/einsum_auxiliary_ops.cc:341 std::unique_ptronnxruntime::Tensor onnxruntime::EinsumOp::MatMul(const onnxruntime::Tensor&, const gsl::span&, const onnxruntime::Tensor&, const gsl::span&, onnxruntime::AllocatorPtr, onnxruntime::concurrency::ThreadPool*, void*, DeviceHelpers::MatMul&) [with T = float; onnxruntime::AllocatorPtr = std::shared_ptronnxruntime::IAllocator; DeviceHelpers::MatMul = std::function<onnxruntime::common::Status(const float*, const float*, float*, long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int, onnxruntime::concurrency::ThreadPool*, void*)>] 21Einsum op: Exception during MatMul operation: CUBLAS failure 7: CUBLAS_STATUS_INVALID_VALUE ; GPU=0 ; hostname=9b6c766f16ea ; file=/workspace/onnxruntime/onnxruntime/core/providers/cuda/math/einsum_utils/einsum_auxiliary_ops.cc ; line=54 ; expr=cublasGemmStridedBatchedHelper( static_cast<EinsumCudaAssets*>(einsum_cuda_assets)->cublas_handle_, CUBLAS_OP_N, CUBLAS_OP_N, static_cast(N), static_cast(M), static_cast(K), &one, reinterpret_cast<const CudaT*>(input_2_data), static_cast(N), static_cast(right_stride), reinterpret_cast<const CudaT*>(input_1_data), static_cast(K), static_cast(left_stride), &zero, reinterpret_cast<CudaT*>(output_data), static_cast(N), static_cast(output_stride), static_cast(num_batches), static_cast<EinsumCudaAssets*>(einsum_cuda_assets)->cuda_ep_->GetDeviceProp(), static_cast<EinsumCudaAssets*>(einsum_cuda_assets)->cuda_ep_->UseTF32());

Was anyone able to serve the onnx model on Triton?
Thanks

v1nc3nt2000

Apr 14

I also want to deploy it via Triton, did you find out how to do so and what config.pbtxt did you use?

pemoi1982datapelago

Apr 14

I also want to deploy it via Triton, did you find out how to do so and what config.pbtxt did you use?

yes, I have it working.
I sued torch to export into onnx (torch.onnx.export), using torch.int32 as datatype.
This is my config.pbtxt:

platform: "onnxruntime_onnx"
max_batch_size: 128
input [
{
name: "input_ids"
data_type: TYPE_INT32
dims: [ -1 ]
},
{
name: "attention_mask"
data_type: TYPE_INT32
dims: [ -1 ]
},
{
name: "words_mask"
data_type: TYPE_INT32
dims: [ -1 ]
},
{
name: "text_lengths"
data_type: TYPE_INT32
dims: [ 1 ]
},
{
name: "span_idx"
data_type: TYPE_INT32
dims: [ -1, 2 ]
},
{
name: "span_mask"
data_type: TYPE_BOOL
dims: [ -1 ]
}
]
output [
{
name: "logits"
data_type: TYPE_FP32
dims: [-1, -1, -1]
}
]

v1nc3nt2000

May 19

Thanks that helped a lot in getting it running on our triton server!!

ngtackle

Aug 21

@pemoi1982datapelago

can you share some of the configs and model code for the preprocessor and postprocessor models?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment