soldni commited on
Commit
59ef8f1
·
verified ·
1 Parent(s): 833d118

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -91
README.md CHANGED
@@ -3,94 +3,4 @@ library_name: transformers
3
  tags: []
4
  ---
5
 
6
- # Dolma 2 tokenizer, Instruct v2, reasoner version
7
-
8
- Slightly modified version of `cl100k_base` that supports Dolma 1.x and Dolma 2.x special tokens.
9
-
10
- ## Special tokens
11
-
12
- This tokenizer supports the following special tokens:
13
-
14
- - `<|extra_id_0|>`: Not used.
15
- - `<|endoftext|>`: Used to mark both beginning and end of text.
16
- - `<|fim_prefix|>`: Used to mark the prefix fill-in-the-middle request.
17
- - `<|fim_middle|>`: Used to mark the middle fill-in-the-middle request.
18
- - `<|fim_suffix|>`: Used to mark the suffix fill-in-the-middle request.
19
- - `|||PHONE_NUMBER|||`: Not used. Kept for compatibility with Dolma 1.x.
20
- - `|||EMAIL_ADDRESS|||`: Not used. Kept for compatibility with Dolma 1.x.
21
- - `|||IP_ADDRESS|||`: Not used. Kept for compatibility with Dolma 1.x.
22
- - `<|im_start|>`: Indicates the beginning of a message (turn in a conversation).
23
- - `<|im_end|>`: Indicates the end of a message (turn in a conversation).
24
- - `<|extra_id_1|>`: Not used.
25
- - `<|extra_id_2|>`: Not used.
26
- - `<|extra_id_3|>`: Not used.
27
- - `<|extra_id_4|>`: Not used.
28
- - `<|extra_id_5|>`: Not used.
29
- - `<|extra_id_6|>`: Not used.
30
- - `<|extra_id_7|>`: Not used.
31
- - `<|extra_id_8|>`: Not used.
32
- - `<|extra_id_9|>`: Not used.
33
- - `<|extra_id_10|>`: Not used.
34
- - `<|endofprompt|>`: Not Used.
35
- - `<|pad|>`: Symbol to pad input sequences.
36
-
37
-
38
- ## Chat template
39
-
40
- The chat template is as follows (**for reference only**, actual template is in `tokenizer_config.json`):
41
-
42
- ```jinja
43
- {% set has_system = messages|selectattr('role', 'equalto', 'system')|list|length > 0 %}
44
- {% if not has_system %}
45
- {{ '<|im_start|>system
46
- You are Olmo, a helpful AI assistant built by Ai2. Your date cutoff is December 2024, and your model weights are available at https://huggingface.co/allenai.<|im_end|>
47
- ' }}
48
- {% endif %}
49
- {% for message in messages %}
50
- {% if message['role'] == 'system' %}
51
- {{ '<|im_start|>system
52
- ' + message['content'] }}
53
- {% if message.get('functions', none) is not none %}
54
- {{ ' <functions>' + message['functions'] + '</functions><|im_end|>
55
- ' }}
56
- {% else %}
57
- {{ ' You do not currently have access to any functions. <functions></functions><|im_end|>
58
- ' }}
59
- {% endif %}
60
- {% elif message['role'] == 'user' %}
61
- {% if message.get('functions', none) is not none %}
62
- {{ '<|im_start|>user
63
- ' + message['content'] + '
64
- ' + '<functions>' + message['functions'] + '</functions><|im_end|>
65
- ' }}
66
- {% else %}
67
- {{ '<|im_start|>user
68
- ' + message['content'] + '<|im_end|>
69
- ' }}
70
- {% endif %}
71
- {% elif message['role'] == 'assistant' %}
72
- {{ '<|im_start|>assistant
73
- ' }}
74
- {% if message.get('content', none) is not none %}
75
- {{ message['content'] }}
76
- {% endif %}
77
- {% if message.get('function_calls', none) is not none %}
78
- {{ '<function_calls>' + message['function_calls'] + '</function_calls>' }}
79
- {% endif %}
80
- {% if not loop.last %}
81
- {{ '<|im_end|>' + '
82
- ' }}
83
- {% else %}
84
- {{ eos_token }}
85
- {% endif %}
86
- {% elif message['role'] == 'environment' %}
87
- {{ '<|im_start|>environment
88
- ' + message['content'] + '<|im_end|>
89
- ' }}
90
- {% endif %}
91
- {% if loop.last and add_generation_prompt %}
92
- {{ '<|im_start|>assistant
93
- <think>' }}
94
- {% endif %}
95
- {% endfor %}
96
- ```
 
3
  tags: []
4
  ---
5
 
6
+ Compatible with new reasoning parser.