Peng

pennlio

pennlio111

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 months ago

Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

upvoted a paper 5 months ago

LLark: A Multimodal Foundation Model for Music

upvoted a paper 5 months ago

TALKPLAY: Multimodal Music Recommendation with Large Language Models

View all activity

Organizations

upvoted a paper 2 months ago

Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

Paper • 2508.21365 • Published Aug 29 • 28

upvoted 2 papers 5 months ago

LLark: A Multimodal Foundation Model for Music

Paper • 2310.07160 • Published Oct 11, 2023 • 2

TALKPLAY: Multimodal Music Recommendation with Large Language Models

Paper • 2502.13713 • Published Feb 19 • 3

liked 2 models over 1 year ago

xai-org/grok-1

Text Generation • Updated Mar 28, 2024 • 2.98k • 2.37k

gradientai/Llama-3-8B-Instruct-Gradient-1048k

Text Generation • 8B • Updated Oct 29, 2024 • 14.3k • 678

liked a dataset over 1 year ago

m-a-p/COIG-CQIA

Viewer • Updated Apr 18, 2024 • 44.7k • 4.33k • 656

upvoted an article over 1 year ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

• 367

liked 2 models over 1 year ago

meta-llama/Meta-Llama-3-8B

Text Generation • 8B • Updated Sep 27, 2024 • 1.74M • • 6.36k

unsloth/llama-3-8b-bnb-4bit

Text Generation • 5B • Updated Jan 7 • 36.1k • 202

liked 2 models about 2 years ago

stabilityai/stable-diffusion-x4-upscaler

Updated Jul 5, 2023 • 15.6k • 711

stabilityai/stable-diffusion-xl-base-1.0

Text-to-Image • Updated Oct 30, 2023 • 2.38M • • 7.09k

liked a model over 2 years ago

Vision-CAIR/MiniGPT-4

Updated Apr 19, 2023 • 427

liked a dataset over 2 years ago

fka/awesome-chatgpt-prompts

Viewer • Updated Jan 6 • 203 • 35.4k • 9.33k

updated a model over 2 years ago

pennlio/test

Updated May 22, 2023

Peng

AI & ML interests

Recent Activity

Organizations

pennlio's activity

Illustrating Reinforcement Learning from Human Feedback (RLHF)