Instructions to use nvidia/RE-USE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MambaSSM
How to use nvidia/RE-USE with MambaSSM:
from mamba_ssm import MambaLMHeadModel model = MambaLMHeadModel.from_pretrained("nvidia/RE-USE") - Notebooks
- Google Colab
- Kaggle
CUDA requirement?
is it possible to run this model without CUDA requirements? say on apple silicon? would love to get some feedback on this
To my knowledge, Mamba uses custom Triton CUDA kernels for acceleration, so NVIDIA GPUs provide the best performance. On Apple Silicon, it is only runnable, and maybe you can refer to this repository for setup and execution: https://github.com/RoyChao19477/RE-USE-MPS
is it possible to run this model without CUDA requirements, say on Apple Silicon? would love to get some feedback on this
I exported it as an ONNX to see how it's going to work in cpu && (metal via webgpu) in Rust (in MacBook M3 Air 16 GB). And it's slow, very slow. Not practical usage without a decent gpu.
though:
https://github.com/microsoft/onnxruntime/issues/27796
It seems they merged that (I didnt test it yet with that new version). So I believe in the future versions, it could work a little bit faster.