π€ Nano GPT - Built From Scratch
Hey there! Welcome to my tiny language model. I built this GPT from scratch as a learning project, and honestly, it was pretty fun watching it learn to generate text!
What is this?
This is a super small GPT-2 style language model that I trained on my laptop. It's not going to write your essays or solve world hunger, but it's a cool demonstration of how these language models actually work under the hood.
Think of it as a baby GPT - it can generate text, but don't expect Shakespeare. More like... an enthusiastic toddler who just learned to talk.
Model Stats
- Parameters: ~1,065,728 (yes, that's million with an M, not billion!)
- Layers: 4 transformer layers
- Embedding Size: 128 dimensions
- Attention Heads: 4 heads
- Context Length: 128 tokens
- Vocab Size: 2000 tokens
- Training Data: WikiText-2 (5,000 samples)
- Training Time: 10 epochs on my laptop
Quick Start
Want to try it out? Here's how:
from transformers import pipeline
# Load the model
generator = pipeline('text-generation', model='Tanaybh/nano-gpt-from-scratch')
# Generate some text
output = generator(
"The meaning of life is",
max_new_tokens=30,
do_sample=True,
temperature=0.8
)
print(output[0]['generated_text'])
Example Output
I gave it the prompt: "**The **"
And it generated:
The Γ 60 munitions, and injuries were found in the taxonomy in the south, the east of the
Not bad for a tiny model trained in a few hours, right?
Training Details
I trained this model from scratch using:
- Custom BPE tokenizer (trained on the same data)
- GPT-2 architecture (just way smaller)
- AdamW optimizer with a learning rate of 0.0005
- Batch size of 8
- Trained for 10 epochs
The whole thing runs on a regular laptop - no fancy GPU clusters needed!
Limitations
Let's be real here:
- This model is TINY. Like, really tiny. It has 1,065,728 parameters vs GPT-3's 175 billion.
- It was only trained on 5,000 Wikipedia samples, so its knowledge is... limited.
- It might generate weird or nonsensical text sometimes. That's part of the charm!
- Maximum context length is only 128 tokens, so don't expect long conversations.
- It's a base model with no instruction tuning, so it just continues text rather than following commands.
Why I Made This
I wanted to understand how language models work by building one myself. Sure, I could've just fine-tuned a pre-trained model, but where's the fun in that? This project taught me about:
- Tokenizer training
- Transformer architecture
- Training dynamics
- How LLMs actually generate text
Plus, now I can say I trained a language model from scratch on my laptop. Pretty cool, right?
Future Improvements
Some things I might try:
- Train on more data (maybe the full WikiText dataset)
- Experiment with different model sizes
- Try different tokenizer configurations
- Add instruction tuning
- Fine-tune it for specific tasks
License
MIT - Feel free to use this however you want! Learn from it, break it, improve it. That's what it's here for.
Acknowledgments
Built with:
- π€ Hugging Face Transformers
- PyTorch
- The WikiText dataset
- Too much coffee β
Note: This is a learning project and experimental model. Use it for fun and education, not production systems!
If you found this interesting or helpful, feel free to star the repo or reach out. Always happy to chat about ML stuff!
Last updated: October 05, 2025
- Downloads last month
- 28