Chris-Alexiuk commited on
Commit
d4152ce
·
verified ·
1 Parent(s): 0416115

Update Model Card

Browse files
Files changed (1) hide show
  1. README.md +19 -19
README.md CHANGED
@@ -216,9 +216,9 @@ Data Labeling for Evaluation Datasets:
216
 
217
  ### GPQA
218
 
219
- | Reasoning Mode | pass@1 |
220
- |--------------|------------|
221
- | Reasoning Off | - |
222
  | Reasoning On | 76.01 |
223
 
224
  User Prompt Template:
@@ -229,9 +229,9 @@ User Prompt Template:
229
 
230
  ### AIME25
231
 
232
- | Reasoning Mode | pass@1 |
233
- |--------------|------------|
234
- | Reasoning Off | - |
235
  | Reasoning On | 72.50 |
236
 
237
  User Prompt Template:
@@ -242,9 +242,9 @@ User Prompt Template:
242
 
243
  ### BFCL V2 Live
244
 
245
- | Reasoning Mode | Score |
246
- |--------------|------------|
247
- | Reasoning Off | 74.10 |
248
  | Reasoning On | 74.10 |
249
 
250
  User Prompt Template:
@@ -267,9 +267,9 @@ Here is a list of functions in JSON format that you can invoke.
267
 
268
  ### LiveCodeBench (20240801-20250201)
269
 
270
- | Reasoning Mode | pass@1 |
271
- |--------------|------------|
272
- | Reasoning Off | - |
273
  | Reasoning On | 66.31 |
274
 
275
  User Prompt Template (without starter code):
@@ -300,16 +300,16 @@ You will use the following starter code to write the solution to the problem and
300
 
301
  ### IFEval
302
 
303
- | Reasoning Mode | Strict:Instruction |
304
- |--------------|------------|
305
- | Reasoning Off | - |
306
- | Reasoning On | 88.85 |
307
 
308
  ### MATH500
309
 
310
- | Reasoning Mode | pass@1 |
311
- |--------------|------------|
312
- | Reasoning Off | - |
313
  | Reasoning On | 97.00 |
314
 
315
  User Prompt Template:
 
216
 
217
  ### GPQA
218
 
219
+ | Reasoning Mode | pass@1 |
220
+ |--------------|------------|
221
+ | Reasoning Off | 56.6 |
222
  | Reasoning On | 76.01 |
223
 
224
  User Prompt Template:
 
229
 
230
  ### AIME25
231
 
232
+ | Reasoning Mode | pass@1 |
233
+ |--------------|------------|
234
+ | Reasoning Off | 16.7 |
235
  | Reasoning On | 72.50 |
236
 
237
  User Prompt Template:
 
242
 
243
  ### BFCL V2 Live
244
 
245
+ | Reasoning Mode | Score |
246
+ |--------------|------------|
247
+ | Reasoning Off | 73.62 |
248
  | Reasoning On | 74.10 |
249
 
250
  User Prompt Template:
 
267
 
268
  ### LiveCodeBench (20240801-20250201)
269
 
270
+ | Reasoning Mode | pass@1 |
271
+ |--------------|------------|
272
+ | Reasoning Off | 29.03 |
273
  | Reasoning On | 66.31 |
274
 
275
  User Prompt Template (without starter code):
 
300
 
301
  ### IFEval
302
 
303
+ | Reasoning Mode | Strict:Instruction |
304
+ |--------------|------------|
305
+ | Reasoning Off | 88.85 |
306
+ | Reasoning On | 89.45 |
307
 
308
  ### MATH500
309
 
310
+ | Reasoning Mode | pass@1 |
311
+ |--------------|------------|
312
+ | Reasoning Off | 80.4 |
313
  | Reasoning On | 97.00 |
314
 
315
  User Prompt Template: