PyTorch
English
llama
desaifan-mbzuai commited on
Commit
0c7a990
·
verified ·
1 Parent(s): 707673e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -62,6 +62,16 @@ Below we report performance across general, reasoning, mathematical, and coding
62
  | **Coding Tasks** | | | | | | | | | |
63
  | **MBPP** | 57.6 | 57.8 | 58.2 | 59.8 | 61.8 | **75.4** | <u>69.2</u> | 64.4 | 60.2 |
64
  | **HUMANEVAL** | 50.0 | 51.2 | <u>53.7</u> | **54.3** | **54.3** | **54.3** | 42.1 | 50.6 | 36.0 |
 
 
 
 
 
 
 
 
 
 
65
 
66
 
67
  Below we report the evaluation results for K2-V2 after supervised fine-tuning (SFT). These variants correspond to three levels of reasoning effort (Low < Medium < High).
 
62
  | **Coding Tasks** | | | | | | | | | |
63
  | **MBPP** | 57.6 | 57.8 | 58.2 | 59.8 | 61.8 | **75.4** | <u>69.2</u> | 64.4 | 60.2 |
64
  | **HUMANEVAL** | 50.0 | 51.2 | <u>53.7</u> | **54.3** | **54.3** | **54.3** | 42.1 | 50.6 | 36.0 |
65
+ | **Logic Puzzles** | | | | | | | | | |
66
+ | **COUNTDOWN** | 1.3 | <u>53.3</u> | 53.1 | 35.9 | **75.6** | 6.0 | 1.0 | 0.5 | 23.2 |
67
+ | **KK-4 PEOPLE** | 4.8 | 44.9 | <u>68.0</u> | 64.5 | **92.9** | 26.1 | 4.2 | 7.6 | 42.4 |
68
+ | **KK-8 PEOPLE** | 0.5 | 23.2 | 41.3 | <u>51.6</u> | **82.8** | 5.7 | 1.1 | 1.3 | 13.0 |
69
+ | **ORDER-15 ITEMS** | 4.7 | 30.7 | 47.2 | <u>55.8</u> | **87.6** | 37.0 | 3.5 | 4.5 | 25.0 |
70
+ | **ORDER-30 ITEMS** | 0.0 | 0.3 | 3.0 | <u>34.1</u> | **40.3** | 0.7 | 0.2 | 0.1 | 0.6 |
71
+ | **Instruction Following** | | | | | | | | | |
72
+ | **IFEVAL** | 17.4 | 26.2 | 28.5 | <u>34.5</u> | 26.7 | **40.3** | 15.1 | 17.4 | 13.2 |
73
+ | **Arabic** | | | | | | | | | |
74
+ | **MMLU-Arabic** | 65.4 | 66.1 | 64.5 | 66.6 | 65.5 | **74.1** | 65.0 | <u>66.8</u> | 47.8 |
75
 
76
 
77
  Below we report the evaluation results for K2-V2 after supervised fine-tuning (SFT). These variants correspond to three levels of reasoning effort (Low < Medium < High).