nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources
Paper
•
2309.02373
•
Published
•
1
Google's T5-v1.1-base pre-trained for 24 hours (80k steps / 256 batch size) on a single GPU in nanoT5 library for efficient pre-training.
For more details about training check out the paper about this work.
For more details about the model refer to the original paper and original model weights.
It can be further fine-tuned on SuperNatural-Instructions dataset to achieve comparable performance to the same model pre-trained on 150x more data through "a combination of model and data parallelism [...] on slices of Cloud TPU Pods", each with 1024 TPUs.