Submit prompts to generate responses and track progress
Implement test-time compute scaling for math problems