Deployment support

#4
by Arkduke - opened

I’m trying to run the model on GPU but facing some version compatibility issues. Could you please provide the compatible versions for the following components: cuDNN, NumPy, PyTorch, ONNX Runtime (GPU), CUDA, Transformers, and Accelerate?

this is with the latest version of the libraries, I am running in a docker container with the image: nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
image.png

this is happening when I mentioned the onnxruntime-gpu version to 1.17.0:
image.png

I converted these model to onnx version here https://github.com/deepanshu-yadav/Quantize_speech_Recognition_For_Hindi
This is for hindi. But performing infernce is easy for other languages is easy. Just figure out the offset number out of total 5632 tokens with each language has 256 tokens.

I want to deploy the multilingual model, and facing issue with gpu inferencing. Please help me regarding this.

Sign up or log in to comment