yangzhch6 commited on
Commit
981bf16
·
verified ·
1 Parent(s): 9d7ca90

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -19
README.md CHANGED
@@ -4,22 +4,4 @@ library_name: transformers
4
  pipeline_tag: text-generation
5
  ---
6
 
7
- The base Qwen2.5-Math-7B model used by LUFFY, described in [Learning to Reason under Off-Policy Guidance](https://huggingface.co/papers/2504.14945).
8
- We change to rope_theta from 10000 to 40000 and extend the context window to 16k.
9
- Also, we modify the chat_template for the system prompt and add <think>.
10
-
11
- Github: https://github.com/ElliottYan/LUFFY
12
-
13
- # Citation
14
- If you find our model, data, or evaluation code useful, please kindly cite our paper:
15
- ```bib
16
- @misc{luffy,
17
- title={Learning to Reason under Off-Policy Guidance},
18
- author={Jianhao Yan and Yafu Li and Zican Hu and Zhi Wang and Ganqu Cui and Xiaoye Qu and Yu Cheng and Yue Zhang},
19
- year={2025},
20
- eprint={2504.14945},
21
- archivePrefix={arXiv},
22
- primaryClass={cs.LG},
23
- url={https://arxiv.org/abs/2504.14945},
24
- }
25
- ```
 
4
  pipeline_tag: text-generation
5
  ---
6
 
7
+ Follwoing LUFFY, we change to rope_theta from 10000 to 40000 and extend the context window to 16k.