tags: - kernels - cuda
RMSNorm kernel for ROCm devices from https://github.com/huggingface/hf-rocm-kernels