This is a simple PreTrainedTokenizerFast with 5120 tokens trained on a subset of karpathy/fineweb-edu-100b-shuffle, which is itself a subset of HuggingFaceFW/fineweb-edu.

The tokenizer includes 6 special tokens:

class SpecialTokens:
    PAD       = 0
    BOS       = 1
    EOS       = 2
    SYSTEM    = 3
    USER      = 4
    ASSISTANT = 5

special_tokens_map = {
    "<|PAD|>":       SpecialTokens.PAD,
    "<|BOS|>":       SpecialTokens.BOS,
    "<|EOS|>":       SpecialTokens.EOS,
    "<|SYSTEM|>":    SpecialTokens.SYSTEM,
    "<|USER|>":      SpecialTokens.USER,
    "<|ASSISTANT|>": SpecialTokens.ASSISTANT
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support