zhuyaoyu commited on
Commit
045bc4d
·
verified ·
1 Parent(s): 58e62be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -17,8 +17,8 @@ pipeline_tag: text-generation
17
  <img src="./assets/rtllm_tts_flops.png" alt="RTLLM TTS FLOPs Results" width="400">
18
  </div>
19
  <figcaption class="caption mt-3 has-text-centered is-size-7 has-text-grey">
20
- Test-time scaling curves. Left: Inference time as a function of token length. Right: Inference time vs. estimated FLOPs consumption.
21
- When measured by FLOPs consumption, our model achieves better results with fewer computational resources than DeepSeek-R1, highlighting its superior efficiency.
22
  </figcaption>
23
  </div>
24
 
@@ -28,7 +28,8 @@ Large language models (LLMs) trained via reinforcement learning with verifiable
28
 
29
  To this end, we introduce **CodeV-R1**, an RLVR framework for training Verilog generation LLMs, As a continuation of the work initiated with [CodeV](https://huggingface.co/collections/yang-z/codev-6698a560cd94e61a9675fa2a). First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM‐generated NL descriptions, verifies code–NL–code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage distill-then-RL training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate.
30
 
31
- This model, **CodeV-R1-Qwen-7B**, is the model after reinforcement learning. The distillation model, **CodeV-R1-Distill-Qwen-7B**, is provided [here](https://huggingface.co/zhuyaoyu/CodeV-R1-Distill-Qwen-7B). For more training details, please refer to our [paper](https://arxiv.org/abs/2505.24183).
 
32
 
33
  ### 2. Evaluation Results
34
 
@@ -113,7 +114,7 @@ CodeV-R1-Qwen-7B is derived from [Qwen-2.5 series](https://github.com/QwenLM/Qwe
113
  If you find our model helpful, please cite our [paper](https://arxiv.org/abs/2505.24183):
114
 
115
  ```tex
116
- @misc{zhu2025codevr1reasoningenhancedveriloggeneration,
117
  title={CodeV-R1: Reasoning-Enhanced Verilog Generation},
118
  author={Yaoyu Zhu and Di Huang and Hanqi Lyu and Xiaoyun Zhang and Chongxiao Li and Wenxuan Shi and Yutong Wu and Jianan Mu and Jinghua Wang and Yang Zhao and Pengwei Jin and Shuyao Cheng and Shengwen Liang and Xishan Zhang and Rui Zhang and Zidong Du and Qi Guo and Xing Hu and Yunji Chen},
119
  year={2025},
 
17
  <img src="./assets/rtllm_tts_flops.png" alt="RTLLM TTS FLOPs Results" width="400">
18
  </div>
19
  <figcaption class="caption mt-3 has-text-centered is-size-7 has-text-grey">
20
+ Test-time scaling curves. <strong>Left</strong>: Inference time as a function of token length. <strong>Right</strong>: Inference time vs. estimated FLOPs consumption.
21
+ When measured by FLOPs consumption, our <strong>CodeV-R1-Qwen-7B</strong> achieves better results with fewer computational resources than DeepSeek-R1, highlighting its superior efficiency.
22
  </figcaption>
23
  </div>
24
 
 
28
 
29
  To this end, we introduce **CodeV-R1**, an RLVR framework for training Verilog generation LLMs, As a continuation of the work initiated with [CodeV](https://huggingface.co/collections/yang-z/codev-6698a560cd94e61a9675fa2a). First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM‐generated NL descriptions, verifies code–NL–code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage distill-then-RL training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate.
30
 
31
+ **CodeV-R1-Qwen-7B**, is a model that employs reinforcement learning (RL) fine-tuning, built upon the foundation of **CodeV-R1-Distill-Qwen-7B**. The distillation-based precursor, **CodeV-R1-Distill-Qwen-7B**, is provided [here](https://huggingface.co/zhuyaoyu/CodeV-R1-Distill-Qwen-7B).
32
+ For more training details, please refer to our [paper](https://arxiv.org/abs/2505.24183).
33
 
34
  ### 2. Evaluation Results
35
 
 
114
  If you find our model helpful, please cite our [paper](https://arxiv.org/abs/2505.24183):
115
 
116
  ```tex
117
+ @misc{zhu2025codevr1,
118
  title={CodeV-R1: Reasoning-Enhanced Verilog Generation},
119
  author={Yaoyu Zhu and Di Huang and Hanqi Lyu and Xiaoyun Zhang and Chongxiao Li and Wenxuan Shi and Yutong Wu and Jianan Mu and Jinghua Wang and Yang Zhao and Pengwei Jin and Shuyao Cheng and Shengwen Liang and Xishan Zhang and Rui Zhang and Zidong Du and Qi Guo and Xing Hu and Yunji Chen},
120
  year={2025},