🤖 Nano GPT - Built From Scratch

Hey there! Welcome to my tiny language model. I built this GPT from scratch as a learning project, and honestly, it was pretty fun watching it learn to generate text!

What is this?

This is a super small GPT-2 style language model that I trained on my laptop. It's not going to write your essays or solve world hunger, but it's a cool demonstration of how these language models actually work under the hood.

Think of it as a baby GPT - it can generate text, but don't expect Shakespeare. More like... an enthusiastic toddler who just learned to talk.

Model Stats

Parameters: ~1,065,728 (yes, that's million with an M, not billion!)
Layers: 4 transformer layers
Embedding Size: 128 dimensions
Attention Heads: 4 heads
Context Length: 128 tokens
Vocab Size: 2000 tokens
Training Data: WikiText-2 (5,000 samples)
Training Time: 10 epochs on my laptop

Quick Start

Want to try it out? Here's how:

from transformers import pipeline

# Load the model
generator = pipeline('text-generation', model='Tanaybh/nano-gpt-from-scratch')

# Generate some text
output = generator(
    "The meaning of life is",
    max_new_tokens=30,
    do_sample=True,
    temperature=0.8
)

print(output[0]['generated_text'])

Example Output

I gave it the prompt: "**The **"

And it generated:

The × 60 munitions, and injuries were found in the taxonomy in the south, the east of the

Not bad for a tiny model trained in a few hours, right?

Training Details

I trained this model from scratch using:

Custom BPE tokenizer (trained on the same data)
GPT-2 architecture (just way smaller)
AdamW optimizer with a learning rate of 0.0005
Batch size of 8
Trained for 10 epochs

The whole thing runs on a regular laptop - no fancy GPU clusters needed!

Limitations

Let's be real here:

This model is TINY. Like, really tiny. It has 1,065,728 parameters vs GPT-3's 175 billion.
It was only trained on 5,000 Wikipedia samples, so its knowledge is... limited.
It might generate weird or nonsensical text sometimes. That's part of the charm!
Maximum context length is only 128 tokens, so don't expect long conversations.
It's a base model with no instruction tuning, so it just continues text rather than following commands.

Why I Made This

I wanted to understand how language models work by building one myself. Sure, I could've just fine-tuned a pre-trained model, but where's the fun in that? This project taught me about:

Tokenizer training
Transformer architecture
Training dynamics
How LLMs actually generate text

Plus, now I can say I trained a language model from scratch on my laptop. Pretty cool, right?

Future Improvements

Some things I might try:

Train on more data (maybe the full WikiText dataset)
Experiment with different model sizes
Try different tokenizer configurations
Add instruction tuning
Fine-tune it for specific tasks

License

MIT - Feel free to use this however you want! Learn from it, break it, improve it. That's what it's here for.

Acknowledgments

Built with:

🤗 Hugging Face Transformers
PyTorch
The WikiText dataset
Too much coffee ☕

Note: This is a learning project and experimental model. Use it for fun and education, not production systems!

If you found this interesting or helpful, feel free to star the repo or reach out. Always happy to chat about ML stuff!

Last updated: October 05, 2025

Downloads last month: 28

Safetensors

Model size

1.07M params

Tensor type

F32

Tanaybh
/

nano-gpt-from-scratch