view reply You can try my Colab Notebook which let's you estimate model weights memory requirements (credit: Philip Schmid) and kv cache memory requirements: https://colab.research.google.com/drive/16m3Fz4Ocgp-saXUFNRa8zXyXA-y9JigX?usp=sharing
view reply Hi aHaric! I also think the 405b numbers are in FP32. According to my calculations, 405b at 128k should be in the ~66gb ballpark