Improve model card: Add tags, paper link, and expanded description

#1
by nielsr HF Staff - opened

This PR significantly improves the model card for the Self-Certainty: Llama-3.2-3B-Instruct trained on DAPO-14k model by:

  • Adding library_name: transformers to the metadata, enabling the automated "how to use" widget. Evidence for this is found in the config.json and tokenizer_config.json files which indicate Llama architecture and transformers_version.
  • Adding pipeline_tag: text-generation to the metadata, improving discoverability for this large language model.
  • Updating the model card content to include a direct link to the paper: Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models.
  • Expanding the model description with details about the Co-rewarding framework from the paper's abstract and the GitHub repository.
  • Including the citation information.

No sample usage is added as a direct code snippet for inference was not found in the provided GitHub README, adhering to the "Do not make up code yourself" disclaimer.

Please review and merge if these improvements are satisfactory.

Geraldxm changed pull request status to merged

Sign up or log in to comment