--- library_name: transformers --- This is a simple `PreTrainedTokenizerFast` with 5120 tokens trained on a subset of [karpathy/fineweb-edu-100b-shuffle](https://huggingface.co/datasets/karpathy/fineweb-edu-100b-shuffle), which is itself a subset of [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu). The tokenizer includes 6 special tokens: ```py class SpecialTokens: PAD = 0 BOS = 1 EOS = 2 SYSTEM = 3 USER = 4 ASSISTANT = 5 special_tokens_map = { "<|PAD|>": SpecialTokens.PAD, "<|BOS|>": SpecialTokens.BOS, "<|EOS|>": SpecialTokens.EOS, "<|SYSTEM|>": SpecialTokens.SYSTEM, "<|USER|>": SpecialTokens.USER, "<|ASSISTANT|>": SpecialTokens.ASSISTANT } ```