Add model-index with benchmark evaluations
#20
by
davidlms
- opened
Added structured evaluation results from README benchmark table:
Automated Benchmarks:
- MMLU: 55.23
- GPQA: 31.47
- IFEval (Instruction following): 74.89
- IFBench: 20.7
- GSM8K (Math reasoning): 58.3
- MGSM (Multilingual math): 55.04
- MMMLU (Multilingual MMLU): 46.73
Total: 7 benchmarks across reasoning, instruction-following, and multilingual capabilities.
This enables the model to appear in leaderboards and makes it easier to compare with other models.
Note: PR #6 (Support tool calls) modifies the tokenizer configuration and should not conflict with this metadata addition.