A modified GPT-2 model with only 25 million non-embedding params that outbenches GPT-2(124m), Pythia-70m/160m, and Cerebras-111m, it has ScaledSinusoidal position embeddings, embedding layernorm, no biases, and was trained on only 8 billion tokens of the SlimPajama dataset at home on 2xA6000. (On the graphic it's mis-labeled as cramp-41m)
OLD BENCHMARK
| model | avg | arc | hellaswag | mmlu | truthfulqa |
|---|---|---|---|---|---|
| cramp-25m | 30.57 | 21.76 | 27.35 | 25.53 | 47.66 |
| gpt2 (125m) | 30.06 | 22.1 | 31.6 | 25.86 | 40.67 |
| pythia 70m deduped | 30.25 | 21.08 | 27.17 | 25.26 | 47.51 |
| pythia 70m | 30.46 | 21.59 | 27.29 | 25.9 | 47.06 |
| pythia 160m deduped | 31.16 | 24.06 | 30.34 | 24.95 | 44.34 |
| pythia 160m | 30.58 | 22.78 | 30.34 | 24.95 | 44.26 |
*NEW BENCHMARK
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 25 | acc | 0.1724 | ± | 0.0110 |
| none | 25 | acc_norm | 0.2031 | ± | 0.0118 | ||
| truthfulqa_mc2 | 2 | none | 0 | acc | 0.4767 | ± | 0.0156 |
| hellaswag | 1 | none | 10 | acc | 0.2687 | ± | 0.0044 |
| none | 10 | acc_norm | 0.2773 | ± | 0.0045 | ||
| winogrande | 1 | none | 5 | acc | 0.5028 | ± | 0.0141 |
MMLU
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| world_religions | 0 | none | 5 | acc | 0.1813 | ± | 0.0295 |
| virology | 0 | none | 5 | acc | 0.1928 | ± | 0.0307 |
| us_foreign_policy | 0 | none | 5 | acc | 0.2900 | ± | 0.0456 |
| sociology | 0 | none | 5 | acc | 0.2438 | ± | 0.0304 |
| security_studies | 0 | none | 5 | acc | 0.2367 | ± | 0.0272 |
| public_relations | 0 | none | 5 | acc | 0.2455 | ± | 0.0412 |
| professional_psychology | 0 | none | 5 | acc | 0.2271 | ± | 0.0169 |
| professional_medicine | 0 | none | 5 | acc | 0.4375 | ± | 0.0301 |
| professional_law | 0 | none | 5 | acc | 0.2490 | ± | 0.0110 |
| professional_accounting | 0 | none | 5 | acc | 0.2589 | ± | 0.0261 |
| prehistory | 0 | none | 5 | acc | 0.2963 | ± | 0.0254 |
| philosophy | 0 | none | 5 | acc | 0.2315 | ± | 0.0240 |
| nutrition | 0 | none | 5 | acc | 0.2222 | ± | 0.0238 |
| moral_scenarios | 0 | none | 5 | acc | 0.2313 | ± | 0.0141 |
| moral_disputes | 0 | none | 5 | acc | 0.2168 | ± | 0.0222 |
| miscellaneous | 0 | none | 5 | acc | 0.2708 | ± | 0.0159 |
| medical_genetics | 0 | none | 5 | acc | 0.3000 | ± | 0.0461 |
| marketing | 0 | none | 5 | acc | 0.1923 | ± | 0.0258 |
| management | 0 | none | 5 | acc | 0.1942 | ± | 0.0392 |
| machine_learning | 0 | none | 5 | acc | 0.2054 | ± | 0.0383 |
| logical_fallacies | 0 | none | 5 | acc | 0.2393 | ± | 0.0335 |
| jurisprudence | 0 | none | 5 | acc | 0.2130 | ± | 0.0396 |
| international_law | 0 | none | 5 | acc | 0.2562 | ± | 0.0398 |
| human_sexuality | 0 | none | 5 | acc | 0.2366 | ± | 0.0373 |
| human_aging | 0 | none | 5 | acc | 0.2063 | ± | 0.0272 |
| high_school_world_history | 0 | none | 5 | acc | 0.2700 | ± | 0.0289 |
| high_school_us_history | 0 | none | 5 | acc | 0.2206 | ± | 0.0291 |
| high_school_statistics | 0 | none | 5 | acc | 0.4722 | ± | 0.0340 |
| high_school_psychology | 0 | none | 5 | acc | 0.2257 | ± | 0.0179 |
| high_school_physics | 0 | none | 5 | acc | 0.2384 | ± | 0.0348 |
| high_school_microeconomics | 0 | none | 5 | acc | 0.3403 | ± | 0.0308 |
| high_school_mathematics | 0 | none | 5 | acc | 0.2630 | ± | 0.0268 |
| high_school_macroeconomics | 0 | none | 5 | acc | 0.2051 | ± | 0.0205 |
| high_school_government_and_politics | 0 | none | 5 | acc | 0.2280 | ± | 0.0303 |
| high_school_geography | 0 | none | 5 | acc | 0.3535 | ± | 0.0341 |
| high_school_european_history | 0 | none | 5 | acc | 0.2909 | ± | 0.0355 |
| high_school_computer_science | 0 | none | 5 | acc | 0.2400 | ± | 0.0429 |
| high_school_chemistry | 0 | none | 5 | acc | 0.2759 | ± | 0.0314 |
| high_school_biology | 0 | none | 5 | acc | 0.3161 | ± | 0.0265 |
| global_facts | 0 | none | 5 | acc | 0.2000 | ± | 0.0402 |
| formal_logic | 0 | none | 5 | acc | 0.1825 | ± | 0.0346 |
| elementary_mathematics | 0 | none | 5 | acc | 0.2566 | ± | 0.0225 |
| electrical_engineering | 0 | none | 5 | acc | 0.2414 | ± | 0.0357 |
| econometrics | 0 | none | 5 | acc | 0.2544 | ± | 0.0410 |
| conceptual_physics | 0 | none | 5 | acc | 0.2809 | ± | 0.0294 |
| computer_security | 0 | none | 5 | acc | 0.2000 | ± | 0.0402 |
| college_physics | 0 | none | 5 | acc | 0.3431 | ± | 0.0472 |
| college_medicine | 0 | none | 5 | acc | 0.2197 | ± | 0.0316 |
| college_mathematics | 0 | none | 5 | acc | 0.3100 | ± | 0.0465 |
| college_computer_science | 0 | none | 5 | acc | 0.3100 | ± | 0.0465 |
| college_chemistry | 0 | none | 5 | acc | 0.3400 | ± | 0.0476 |
| college_biology | 0 | none | 5 | acc | 0.2083 | ± | 0.0340 |
| clinical_knowledge | 0 | none | 5 | acc | 0.2189 | ± | 0.0254 |
| business_ethics | 0 | none | 5 | acc | 0.2000 | ± | 0.0402 |
| astronomy | 0 | none | 5 | acc | 0.2237 | ± | 0.0339 |
| anatomy | 0 | none | 5 | acc | 0.3333 | ± | 0.0407 |
| abstract_algebra | 0 | none | 5 | acc | 0.2200 | ± | 0.0416 |
- Downloads last month
- 7
