Nougat ONNX
					Collection
				
Faster Nougat in ONNX format (optimum onnxruntime)
					• 
				6 items
				• 
				Updated
					
				•
					
					1
https://huggingface.co/facebook/nougat-small but exported to onnx. This is not quantized.
from transformers import NougatProcessor
from optimum.onnxruntime import ORTModelForVision2Seq
model_name = 'pszemraj/nougat-small-onnx'
processor = NougatProcessor.from_pretrained(model_name)
model = ORTModelForVision2Seq.from_pretrained(
    model_name,
    provider="CPUExecutionProvider", # 'CUDAExecutionProvider' for gpu 
    use_merged=False,
    use_io_binding=True
)
on colab CPU-only (at time of writing) you may get CuPy errors, to solve this uninstall it:
pip uninstall cupy-cuda11x -y
See here or this basic notebook I uploaded. It seems ONNX brings CPU inference times to 'feasible' - it took ~15 mins for Attention is All You Meme on Colab free CPU runtime.