Google's T5-v1.1-base pre-trained for 24 hours (80k steps / 256 batch size) on a single GPU in nanoT5 library for efficient pre-training.

For more details about training check out the paper about this work.

For more details about the model refer to the original paper and original model weights.

It can be further fine-tuned on SuperNatural-Instructions dataset to achieve comparable performance to the same model pre-trained on 150x more data through "a combination of model and data parallelism [...] on slices of Cloud TPU Pods", each with 1024 TPUs.

Downloads last month: 122

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train pnawrot/nanoT5-base

Papers for pnawrot/nanoT5-base

nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources

Paper • 2309.02373 • Published Sep 5, 2023 • 1

GLU Variants Improve Transformer

Paper • 2002.05202 • Published Feb 12, 2020 • 4