Learning Smooth Reward Models with Temporal Difference for LLM RL and Inference
Dan Zhang
zd21
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 1 month ago
zd21/DataSciBench
updated
a collection
about 2 months ago
TDRM
updated
a model
about 2 months ago
zd21/DeepSeek-TD1-PRM
Organizations
None yet