burtenshaw HF Staff commited on
Commit
8423d41
·
verified ·
1 Parent(s): 7582168

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -3
README.md CHANGED
@@ -1,3 +1,107 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ # Pretrained base-d20
8
+
9
+ This model is trained with the [nanochat recipe](https://github.com/karpathy/nanochat) by [Andrej Karpathy](https://huggingface.co/karpathy).
10
+
11
+ It was trained with a depth of 20 on 2 billion tokens and corresponds to this [tokenizer](https://huggingface.co/nanochat-students/nanochat-tokenizer-2B). I will combine this repo with the tokenizer.
12
+
13
+ ## Usage
14
+
15
+ coming...
16
+
17
+
18
+ ## Base model evaluation
19
+ timestamp: 2025-10-14 16:16:53
20
+
21
+ - Model: base_model (step 21400)
22
+ - CORE metric: 0.1963
23
+ - hellaswag_zeroshot: 0.2634
24
+ - jeopardy: 0.0959
25
+ - bigbench_qa_wikidata: 0.4993
26
+ - arc_easy: 0.5269
27
+ - arc_challenge: 0.1251
28
+ - copa: 0.4400
29
+ - commonsense_qa: 0.0653
30
+ - piqa: 0.3743
31
+ - openbook_qa: 0.1440
32
+ - lambada_openai: 0.3683
33
+ - hellaswag: 0.2630
34
+ - winograd: 0.2674
35
+ - winogrande: 0.0923
36
+ - bigbench_dyck_languages: 0.1050
37
+ - agi_eval_lsat_ar: 0.0326
38
+ - bigbench_cs_algorithms: 0.3674
39
+ - bigbench_operators: 0.1524
40
+ - bigbench_repeat_copy_logic: 0.0000
41
+ - squad: 0.2222
42
+ - coqa: 0.1957
43
+ - boolq: -0.4615
44
+ - bigbench_language_identification: 0.1801
45
+
46
+ ## Base model loss
47
+ timestamp: 2025-10-14 16:11:41
48
+
49
+ - train bpb: 0.8147
50
+ - val bpb: 0.8121
51
+ - sample 0: <|bos|>The capital of France is Paris. It is the largest city in France and the capital of the country.
52
+ - sample 1: <|bos|>The chemical symbol of gold is Au and the atomic number is 79. Gold is a soft, malleable,
53
+ - sample 2: <|bos|>If yesterday was Friday, then tomorrow will be Saturday. If today is Monday, then tomorrow will be Tuesday. If today is
54
+ - sample 3: <|bos|>The opposite of hot is cold. The opposite of hot is cold. The opposite of hot is cold.
55
+ - sample 4: <|bos|>The planets of the solar system are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune,
56
+ - sample 5: <|bos|>My favorite color is blue. I love the color blue because it is a color that is so versatile
57
+ - sample 6: <|bos|>If 5*x + 3 = 13, then x is a factor of 5. If 5*x + 3 =
58
+
59
+ ## Base model training
60
+ timestamp: 2025-10-14 14:28:31
61
+
62
+ - run: dummy
63
+ - depth: 20
64
+ - max_seq_len: 2048
65
+ - num_iterations: -1
66
+ - target_flops: -1.0000
67
+ - target_param_data_ratio: 20
68
+ - device_batch_size: 32
69
+ - total_batch_size: 524,288
70
+ - embedding_lr: 0.2000
71
+ - unembedding_lr: 0.0040
72
+ - weight_decay: 0.0000
73
+ - matrix_lr: 0.0200
74
+ - grad_clip: 1.0000
75
+ - eval_every: 250
76
+ - eval_tokens: 10,485,760
77
+ - core_metric_every: 2000
78
+ - core_metric_max_per_task: 500
79
+ - sample_every: 2000
80
+ - model_tag:
81
+ - Number of parameters: 560,988,160
82
+ - Number of FLOPs per token: 3.491758e+09
83
+ - Calculated number of iterations: 21,400
84
+ - Number of training tokens: 11,219,763,200
85
+ - Tokens : Params ratio: 20.0000
86
+ - DDP world size: 8
87
+ - warmup_ratio: 0.0000
88
+ - warmdown_ratio: 0.2000
89
+ - final_lr_frac: 0.0000
90
+ - Minimum validation bpb: 0.8120
91
+ - Final validation bpb: 0.8120
92
+ - CORE metric estimate: 0.2059
93
+ - MFU %: 48.36%
94
+ - Total training flops: 3.917670e+19
95
+ - Total training time: 172.18m
96
+ - Peak memory usage: 75422.02MiB
97
+
98
+ ## Training Logs
99
+
100
+ Logs are available on the trackio space [here](https://nanochat-students-trackio.hf.space)
101
+
102
+ <iframe
103
+ src="https://nanochat-students-trackio.hf.space"
104
+ frameborder="0"
105
+ width="850"
106
+ height="450"
107
+ ></iframe>