πŸ€– Nano GPT - Built From Scratch

Hey there! Welcome to my tiny language model. I built this GPT from scratch as a learning project, and honestly, it was pretty fun watching it learn to generate text!

What is this?

This is a super small GPT-2 style language model that I trained on my laptop. It's not going to write your essays or solve world hunger, but it's a cool demonstration of how these language models actually work under the hood.

Think of it as a baby GPT - it can generate text, but don't expect Shakespeare. More like... an enthusiastic toddler who just learned to talk.

Model Stats

  • Parameters: ~1,065,728 (yes, that's million with an M, not billion!)
  • Layers: 4 transformer layers
  • Embedding Size: 128 dimensions
  • Attention Heads: 4 heads
  • Context Length: 128 tokens
  • Vocab Size: 2000 tokens
  • Training Data: WikiText-2 (5,000 samples)
  • Training Time: 10 epochs on my laptop

Quick Start

Want to try it out? Here's how:

from transformers import pipeline

# Load the model
generator = pipeline('text-generation', model='Tanaybh/nano-gpt-from-scratch')

# Generate some text
output = generator(
    "The meaning of life is",
    max_new_tokens=30,
    do_sample=True,
    temperature=0.8
)

print(output[0]['generated_text'])

Example Output

I gave it the prompt: "**The **"

And it generated:

The Γ— 60 munitions, and injuries were found in the taxonomy in the south, the east of the

Not bad for a tiny model trained in a few hours, right?

Training Details

I trained this model from scratch using:

  • Custom BPE tokenizer (trained on the same data)
  • GPT-2 architecture (just way smaller)
  • AdamW optimizer with a learning rate of 0.0005
  • Batch size of 8
  • Trained for 10 epochs

The whole thing runs on a regular laptop - no fancy GPU clusters needed!

Limitations

Let's be real here:

  • This model is TINY. Like, really tiny. It has 1,065,728 parameters vs GPT-3's 175 billion.
  • It was only trained on 5,000 Wikipedia samples, so its knowledge is... limited.
  • It might generate weird or nonsensical text sometimes. That's part of the charm!
  • Maximum context length is only 128 tokens, so don't expect long conversations.
  • It's a base model with no instruction tuning, so it just continues text rather than following commands.

Why I Made This

I wanted to understand how language models work by building one myself. Sure, I could've just fine-tuned a pre-trained model, but where's the fun in that? This project taught me about:

  • Tokenizer training
  • Transformer architecture
  • Training dynamics
  • How LLMs actually generate text

Plus, now I can say I trained a language model from scratch on my laptop. Pretty cool, right?

Future Improvements

Some things I might try:

  • Train on more data (maybe the full WikiText dataset)
  • Experiment with different model sizes
  • Try different tokenizer configurations
  • Add instruction tuning
  • Fine-tune it for specific tasks

License

MIT - Feel free to use this however you want! Learn from it, break it, improve it. That's what it's here for.

Acknowledgments

Built with:

  • πŸ€— Hugging Face Transformers
  • PyTorch
  • The WikiText dataset
  • Too much coffee β˜•

Note: This is a learning project and experimental model. Use it for fun and education, not production systems!

If you found this interesting or helpful, feel free to star the repo or reach out. Always happy to chat about ML stuff!

Last updated: October 05, 2025

Downloads last month
28
Safetensors
Model size
1.07M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Tanaybh/nano-gpt-from-scratch