kevinshin
/

qwen2.5-1.5b-rft-rpo-lr-1e-5-alpha-4-beta-0.01-wc-cw-3k-neg-rethink-pos

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

qwen2.5-1.5b-rft-rpo-lr-1e-5-alpha-4-beta-0.01-wc-cw-3k-neg-rethink-pos / trainer_state.json

kevinshin's picture

Model save

b571663 verified about 1 month ago

history contribute delete

558 kB

File too large to display, you can check the raw version instead.