kevinshin
/

qwen2.5-1.5b-rft-rpo-lr-1e-5-alpha-4-beta-0.01-wc-cw-3k-neg-rethink-pos

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

qwen2.5-1.5b-rft-rpo-lr-1e-5-alpha-4-beta-0.01-wc-cw-3k-neg-rethink-pos / vocab.json

kevinshin's picture

Training in progress, epoch 0

29a6383 verified about 2 months ago

history contribute delete

2.78 MB

File too large to display, you can check the raw version instead.