zhuyaoyu
/

CodeV-R1-RL-Qwen-7B

@@ -17,8 +17,8 @@ pipeline_tag: text-generation
     <img src="./assets/rtllm_tts_flops.png" alt="RTLLM TTS FLOPs Results" width="400">
   </div>
   <figcaption class="caption mt-3 has-text-centered is-size-7 has-text-grey">
-    Test-time scaling curves. Left: Inference time as a function of token length. Right: Inference time vs. estimated FLOPs consumption.
-    When measured by FLOPs consumption, our model achieves better results with fewer computational resources than DeepSeek-R1, highlighting its superior efficiency.
   </figcaption>
 </div>
@@ -28,7 +28,8 @@ Large language models (LLMs) trained via reinforcement learning with verifiable
 To this end, we introduce **CodeV-R1**, an RLVR framework for training Verilog generation LLMs, As a continuation of the work initiated with [CodeV](https://huggingface.co/collections/yang-z/codev-6698a560cd94e61a9675fa2a). First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM‐generated NL descriptions, verifies code–NL–code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage distill-then-RL training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate.
-This model, **CodeV-R1-Qwen-7B**, is the model after reinforcement learning. The distillation model, **CodeV-R1-Distill-Qwen-7B**, is provided [here](https://huggingface.co/zhuyaoyu/CodeV-R1-Distill-Qwen-7B). For more training details, please refer to our  [paper](https://arxiv.org/abs/2505.24183).
 ### 2. Evaluation Results
@@ -113,7 +114,7 @@ CodeV-R1-Qwen-7B is derived from [Qwen-2.5 series](https://github.com/QwenLM/Qwe
 If you find our model helpful, please cite our [paper](https://arxiv.org/abs/2505.24183):
 ```tex
-@misc{zhu2025codevr1reasoningenhancedveriloggeneration,
       title={CodeV-R1: Reasoning-Enhanced Verilog Generation},
       author={Yaoyu Zhu and Di Huang and Hanqi Lyu and Xiaoyun Zhang and Chongxiao Li and Wenxuan Shi and Yutong Wu and Jianan Mu and Jinghua Wang and Yang Zhao and Pengwei Jin and Shuyao Cheng and Shengwen Liang and Xishan Zhang and Rui Zhang and Zidong Du and Qi Guo and Xing Hu and Yunji Chen},
       year={2025},

     <img src="./assets/rtllm_tts_flops.png" alt="RTLLM TTS FLOPs Results" width="400">
   </div>
   <figcaption class="caption mt-3 has-text-centered is-size-7 has-text-grey">
+    Test-time scaling curves. <strong>Left</strong>: Inference time as a function of token length. <strong>Right</strong>: Inference time vs. estimated FLOPs consumption.
+    When measured by FLOPs consumption, our <strong>CodeV-R1-Qwen-7B</strong> achieves better results with fewer computational resources than DeepSeek-R1, highlighting its superior efficiency.
   </figcaption>
 </div>
 To this end, we introduce **CodeV-R1**, an RLVR framework for training Verilog generation LLMs, As a continuation of the work initiated with [CodeV](https://huggingface.co/collections/yang-z/codev-6698a560cd94e61a9675fa2a). First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM‐generated NL descriptions, verifies code–NL–code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage distill-then-RL training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate.
+**CodeV-R1-Qwen-7B**, is a model that employs reinforcement learning (RL) fine-tuning, built upon the foundation of **CodeV-R1-Distill-Qwen-7B**. The distillation-based precursor, **CodeV-R1-Distill-Qwen-7B**, is provided [here](https://huggingface.co/zhuyaoyu/CodeV-R1-Distill-Qwen-7B).
+For more training details, please refer to our [paper](https://arxiv.org/abs/2505.24183).
 ### 2. Evaluation Results
 If you find our model helpful, please cite our [paper](https://arxiv.org/abs/2505.24183):
 ```tex
+@misc{zhu2025codevr1,
       title={CodeV-R1: Reasoning-Enhanced Verilog Generation},
       author={Yaoyu Zhu and Di Huang and Hanqi Lyu and Xiaoyun Zhang and Chongxiao Li and Wenxuan Shi and Yutong Wu and Jianan Mu and Jinghua Wang and Yang Zhao and Pengwei Jin and Shuyao Cheng and Shengwen Liang and Xishan Zhang and Rui Zhang and Zidong Du and Qi Guo and Xing Hu and Yunji Chen},
       year={2025},