This is a simple PreTrainedTokenizerFast with 5120 tokens trained on a subset of karpathy/fineweb-edu-100b-shuffle, which is itself a subset of HuggingFaceFW/fineweb-edu.
The tokenizer includes 6 special tokens:
class SpecialTokens:
PAD = 0
BOS = 1
EOS = 2
SYSTEM = 3
USER = 4
ASSISTANT = 5
special_tokens_map = {
"<|PAD|>": SpecialTokens.PAD,
"<|BOS|>": SpecialTokens.BOS,
"<|EOS|>": SpecialTokens.EOS,
"<|SYSTEM|>": SpecialTokens.SYSTEM,
"<|USER|>": SpecialTokens.USER,
"<|ASSISTANT|>": SpecialTokens.ASSISTANT
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support