Update Model Card
Browse files
README.md
CHANGED
|
@@ -216,9 +216,9 @@ Data Labeling for Evaluation Datasets:
|
|
| 216 |
|
| 217 |
### GPQA
|
| 218 |
|
| 219 |
-
| Reasoning Mode | pass@1 |
|
| 220 |
-
|--------------|------------|
|
| 221 |
-
| Reasoning Off |
|
| 222 |
| Reasoning On | 76.01 |
|
| 223 |
|
| 224 |
User Prompt Template:
|
|
@@ -229,9 +229,9 @@ User Prompt Template:
|
|
| 229 |
|
| 230 |
### AIME25
|
| 231 |
|
| 232 |
-
| Reasoning Mode | pass@1 |
|
| 233 |
-
|--------------|------------|
|
| 234 |
-
| Reasoning Off |
|
| 235 |
| Reasoning On | 72.50 |
|
| 236 |
|
| 237 |
User Prompt Template:
|
|
@@ -242,9 +242,9 @@ User Prompt Template:
|
|
| 242 |
|
| 243 |
### BFCL V2 Live
|
| 244 |
|
| 245 |
-
| Reasoning Mode | Score |
|
| 246 |
-
|--------------|------------|
|
| 247 |
-
| Reasoning Off |
|
| 248 |
| Reasoning On | 74.10 |
|
| 249 |
|
| 250 |
User Prompt Template:
|
|
@@ -267,9 +267,9 @@ Here is a list of functions in JSON format that you can invoke.
|
|
| 267 |
|
| 268 |
### LiveCodeBench (20240801-20250201)
|
| 269 |
|
| 270 |
-
| Reasoning Mode | pass@1 |
|
| 271 |
-
|--------------|------------|
|
| 272 |
-
| Reasoning Off |
|
| 273 |
| Reasoning On | 66.31 |
|
| 274 |
|
| 275 |
User Prompt Template (without starter code):
|
|
@@ -300,16 +300,16 @@ You will use the following starter code to write the solution to the problem and
|
|
| 300 |
|
| 301 |
### IFEval
|
| 302 |
|
| 303 |
-
| Reasoning Mode | Strict:Instruction |
|
| 304 |
-
|--------------|------------|
|
| 305 |
-
| Reasoning Off |
|
| 306 |
-
| Reasoning On |
|
| 307 |
|
| 308 |
### MATH500
|
| 309 |
|
| 310 |
-
| Reasoning Mode | pass@1 |
|
| 311 |
-
|--------------|------------|
|
| 312 |
-
| Reasoning Off |
|
| 313 |
| Reasoning On | 97.00 |
|
| 314 |
|
| 315 |
User Prompt Template:
|
|
|
|
| 216 |
|
| 217 |
### GPQA
|
| 218 |
|
| 219 |
+
| Reasoning Mode | pass@1 |
|
| 220 |
+
|--------------|------------|
|
| 221 |
+
| Reasoning Off | 56.6 |
|
| 222 |
| Reasoning On | 76.01 |
|
| 223 |
|
| 224 |
User Prompt Template:
|
|
|
|
| 229 |
|
| 230 |
### AIME25
|
| 231 |
|
| 232 |
+
| Reasoning Mode | pass@1 |
|
| 233 |
+
|--------------|------------|
|
| 234 |
+
| Reasoning Off | 16.7 |
|
| 235 |
| Reasoning On | 72.50 |
|
| 236 |
|
| 237 |
User Prompt Template:
|
|
|
|
| 242 |
|
| 243 |
### BFCL V2 Live
|
| 244 |
|
| 245 |
+
| Reasoning Mode | Score |
|
| 246 |
+
|--------------|------------|
|
| 247 |
+
| Reasoning Off | 73.62 |
|
| 248 |
| Reasoning On | 74.10 |
|
| 249 |
|
| 250 |
User Prompt Template:
|
|
|
|
| 267 |
|
| 268 |
### LiveCodeBench (20240801-20250201)
|
| 269 |
|
| 270 |
+
| Reasoning Mode | pass@1 |
|
| 271 |
+
|--------------|------------|
|
| 272 |
+
| Reasoning Off | 29.03 |
|
| 273 |
| Reasoning On | 66.31 |
|
| 274 |
|
| 275 |
User Prompt Template (without starter code):
|
|
|
|
| 300 |
|
| 301 |
### IFEval
|
| 302 |
|
| 303 |
+
| Reasoning Mode | Strict:Instruction |
|
| 304 |
+
|--------------|------------|
|
| 305 |
+
| Reasoning Off | 88.85 |
|
| 306 |
+
| Reasoning On | 89.45 |
|
| 307 |
|
| 308 |
### MATH500
|
| 309 |
|
| 310 |
+
| Reasoning Mode | pass@1 |
|
| 311 |
+
|--------------|------------|
|
| 312 |
+
| Reasoning Off | 80.4 |
|
| 313 |
| Reasoning On | 97.00 |
|
| 314 |
|
| 315 |
User Prompt Template:
|