FinRL-DAPO-GRPO-Sentiment-Risk
This repository contains the code and trained model for our FinRL Contest 2025 Task 1 submission:
"A New DAPO Algorithm for Stock Trading".
We propose a novel trading agent that integrates a Group Relative Policy Optimization (GRPO) approach with:
- Insights from the DAPO algorithm (used in LLM preference tuning),
- Sentiment and risk signals derived from LLM-extracted financial news,
- An exponentiated sentiment-risk reward function for more robust decision-making.
Model Highlights
- Framework: Custom RL agent using GRPO with decoupled clipping.
- Reward: Weighted sentiment-risk adjusted return.
- Dataset: FNSPID, based on Nasdaq-100 from 1999โ2023.
- Performance:
- Cumulative return: 230.49% (2020โ2023)
- Info Ratio: 0.37
- Max Drawdown: -49.11%
- Outperforms CPPO-DeepSeek 10% baseline
Files
model_rl.pth: Trained PyTorch model checkpoint.README.md: This file.- (Optionally add: training logs, config files, etc.)
Usage
This model is designed for research use in financial RL.
You can load the model via PyTorch:
import torch
model = torch.load("model_rl.pth")
model.eval()
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support