Update README.md
Browse files
README.md
CHANGED
|
@@ -7,3 +7,51 @@ library_name: transformers
|
|
| 7 |
---
|
| 8 |
|
| 9 |
# Model Card for Notbad v1.1 Mistral 24B
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
# Model Card for Notbad v1.1 Mistral 24B
|
| 10 |
+
|
| 11 |
+
This model has better IFEval scores than our previous model
|
| 12 |
+
[Notbad v1.0 Mistral 24B](https://huggingface.co/notbadai/notbad_v1_0_mistral_24b).
|
| 13 |
+
|
| 14 |
+
Notbad v1.1 Mistral 24B is a reasoning model trained in math and Python coding.
|
| 15 |
+
This model is built upon the
|
| 16 |
+
[Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
|
| 17 |
+
and has been further trained with reinforcement learning on math and coding.
|
| 18 |
+
|
| 19 |
+
One of the key features of Notbad v1.0 is its ability to produce shorter and cleaner reasoning outputs.
|
| 20 |
+
We used open datasets and employed reinforcement learning techniques developed continuing
|
| 21 |
+
from our work on
|
| 22 |
+
[Quiet Star](https://arxiv.org/abs/2403.09629),
|
| 23 |
+
and are similar to
|
| 24 |
+
[Dr. GRPO](https://arxiv.org/abs/2503.20783).
|
| 25 |
+
The reasoning capabilities in this model are from self-improvement and not distilled from any other model.
|
| 26 |
+
It is the result of a fine-tuning from data sampled from multiple of our RL models starting with the
|
| 27 |
+
[Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
|
| 28 |
+
|
| 29 |
+
Special thanks to [Lambda](https://lambda.ai/) and [Deep Infra](https://deepinfra.com/)
|
| 30 |
+
for providing help with compute resources for our research and training this model.
|
| 31 |
+
|
| 32 |
+
You can try the model on **[chat.labml.ai](https://chat.labml.ai)**.
|
| 33 |
+
|
| 34 |
+
## Benchmark results
|
| 35 |
+
|
| 36 |
+
| Evaluation | notbad_v1_1_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|
| 37 |
+
|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
|
| 38 |
+
| mmlu_pro | 0.673 | 0.642 | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 |
|
| 39 |
+
| gpqa_main | 0.467 | 0.447 | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 |
|
| 40 |
+
|
| 41 |
+
**Math & Coding**
|
| 42 |
+
|
| 43 |
+
| Evaluation | notbad_v1_0_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|
| 44 |
+
|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
|
| 45 |
+
| humaneval | 0.872 | 0.869 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 |
|
| 46 |
+
| math | 0.749 | 0.752 | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 |
|
| 47 |
+
|
| 48 |
+
**Instruction following**
|
| 49 |
+
|
| 50 |
+
| Evaluation | notbad_v1_0_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|
| 51 |
+
|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
|
| 52 |
+
| ifeval | 0.779 | 0.514 | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 |
|
| 53 |
+
|
| 54 |
+
**Note**:
|
| 55 |
+
|
| 56 |
+
- Benchmarks are
|
| 57 |
+
from [Mistral-Small-24B-Instruct-2501 Model Card](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)
|