phonemetransformers
/

GPT2-85M-BPE-PHON-SPACELESS

Model card Files Files and versions

GPT2-85M-BPE-PHON-SPACELESS / README.md

codebyzeb's picture

Create README.md

e375563 verified 8 months ago

|

history blame contribute delete

350 Bytes

metadata

datasets:
  - phonemetransformers/IPA-BabyLM
language:
  - en
base_model:
  - openai-community/gpt2

GPT2 trained on the BabyLM 2024 training set (in IPA) using a BPE tokenizer with word boundaries removed.

Model trained for From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes.