Update README.md
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ base_model:
|
|
| 8 |
- mesolitica/Malaysian-Qwen2.5-7B-Reasoning-SFT
|
| 9 |
---
|
| 10 |
|
| 11 |
-
# Malaysian Qwen 2.5 7B Instruct Reasoning GRPO
|
| 12 |
|
| 13 |
Online Reinforcement learning using GRPO full parameter on warmup reasoning SFT https://huggingface.co/mesolitica/Malaysian-Qwen2.5-7B-Reasoning-SFT on highly curated Malay Dialect Reasoning dataset.
|
| 14 |
|
|
|
|
| 8 |
- mesolitica/Malaysian-Qwen2.5-7B-Reasoning-SFT
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# Malaysian Qwen 2.5 7B Instruct Dialect Reasoning GRPO
|
| 12 |
|
| 13 |
Online Reinforcement learning using GRPO full parameter on warmup reasoning SFT https://huggingface.co/mesolitica/Malaysian-Qwen2.5-7B-Reasoning-SFT on highly curated Malay Dialect Reasoning dataset.
|
| 14 |
|