Bengali GPT-2
This is a GPT-2 model finetuned on Bengali Wikipedia. It is designed for text generation in Bengali.
Model Details
- Base model: GPT-2
- Tokenizer: Custom Bengali tokenizer (ByteLevel BPE)
- Language: Bengali (bn)
- Task: Text generation (causal language modeling)
- Training data: Cleaned and deduplicated Bengali Wikipedia dump
- License: Apache 2.0
Usage
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
# Load tokenizer and model from Hugging Face
tokenizer = GPT2TokenizerFast.from_pretrained("rejauldu/bengali-gpt2-tokenizer")
model = GPT2LMHeadModel.from_pretrained("rejauldu/bengali-gpt2")
# Generate text
inputs = tokenizer("বাংলায় স্বাগত", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
- Downloads last month
- 14