metadata
datasets:
- phonemetransformers/IPA-BabyLM
language:
- en
base_model:
- openai-community/gpt2
GPT2 trained on the BabyLM 2024 training set (in IPA) using a BPE tokenizer with word boundaries removed.
Model trained for From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes.