Bengali GPT-2

This is a GPT-2 model finetuned on Bengali Wikipedia. It is designed for text generation in Bengali.

Model Details

  • Base model: GPT-2
  • Tokenizer: Custom Bengali tokenizer (ByteLevel BPE)
  • Language: Bengali (bn)
  • Task: Text generation (causal language modeling)
  • Training data: Cleaned and deduplicated Bengali Wikipedia dump
  • License: Apache 2.0

Usage

from transformers import GPT2LMHeadModel, GPT2TokenizerFast

# Load tokenizer and model from Hugging Face
tokenizer = GPT2TokenizerFast.from_pretrained("rejauldu/bengali-gpt2-tokenizer")
model = GPT2LMHeadModel.from_pretrained("rejauldu/bengali-gpt2")

# Generate text
inputs = tokenizer("বাংলায় স্বাগত", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support