TMLR-Group-HF
/

Majority-Voting-Qwen3-8B-Base-DAPO14k

Text Generation

text-generation-inference

Model card Files Files and versions

resistz commited on about 1 month ago

Commit

41cacda

·

verified ·

1 Parent(s): 336a282

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ tags:
 - reasoning
 ---
-# Majority-Voting: Qwen3-8B-Base** model, trained on the DAPO-14k dataset
 This is the **Majority-Voting: Qwen3-8B-Base** model, trained on the DAPO-14k dataset. This model is part of the research presented in the paper [Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models](https://huggingface.co/papers/2508.00410).

 - reasoning
 ---
+# Majority-Voting: Qwen3-8B-Base model, trained on the DAPO-14k dataset
 This is the **Majority-Voting: Qwen3-8B-Base** model, trained on the DAPO-14k dataset. This model is part of the research presented in the paper [Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models](https://huggingface.co/papers/2508.00410).