Add model-index with benchmark evaluations

#20
by davidlms - opened

Added structured evaluation results from README benchmark table:

Automated Benchmarks:

  • MMLU: 55.23
  • GPQA: 31.47
  • IFEval (Instruction following): 74.89
  • IFBench: 20.7
  • GSM8K (Math reasoning): 58.3
  • MGSM (Multilingual math): 55.04
  • MMMLU (Multilingual MMLU): 46.73

Total: 7 benchmarks across reasoning, instruction-following, and multilingual capabilities.

This enables the model to appear in leaderboards and makes it easier to compare with other models.

Note: PR #6 (Support tool calls) modifies the tokenizer configuration and should not conflict with this metadata addition.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment