Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,7 @@ tags:
|
|
| 6 |
- reasoning
|
| 7 |
---
|
| 8 |
|
| 9 |
-
# Majority-Voting: Qwen3-8B-Base
|
| 10 |
|
| 11 |
This is the **Majority-Voting: Qwen3-8B-Base** model, trained on the DAPO-14k dataset. This model is part of the research presented in the paper [Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models](https://huggingface.co/papers/2508.00410).
|
| 12 |
|
|
|
|
| 6 |
- reasoning
|
| 7 |
---
|
| 8 |
|
| 9 |
+
# Majority-Voting: Qwen3-8B-Base model, trained on the DAPO-14k dataset
|
| 10 |
|
| 11 |
This is the **Majority-Voting: Qwen3-8B-Base** model, trained on the DAPO-14k dataset. This model is part of the research presented in the paper [Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models](https://huggingface.co/papers/2508.00410).
|
| 12 |
|