Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
McGill-NLP 's Collections
The Markovian Thinker
SSA-COMET
INJONGO
Unequal unlearning
AgentRewardBench
Malicious-IR
SafeArena
CHASE
LLM2Vec
WebLINX
AURORA
WebLINX Models
Statcan Dialogue Dataset & Models
FaithDial
MLQuestions

The Markovian Thinker

updated Oct 9

Reformulating the RL of reasoning LLMs through Markovian Thinking paradigm.

Upvote
11

  • McGill-NLP/delethink-24k-1.5b

    2B • Updated Oct 9 • 190 • 5

  • McGill-NLP/longcot-24k-1.5b

    2B • Updated Oct 9 • 151 • 1

  • McGill-NLP/longcot-8k-1.5b

    2B • Updated Oct 9 • 7

  • McGill-NLP/delethink-96k-base-1.5b

    2B • Updated Oct 3 • 1 • 1

  • McGill-NLP/openmath-filtered

    Viewer • Updated Oct 3 • 200k • 82

  • McGill-NLP/delethink-96k-1.5b

    2B • Updated Oct 9 • 9 • 3

  • The Markovian Thinker

    Paper • 2510.06557 • Published Oct 8 • 30
Upvote
11
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs