thinking / ocr / reasoning
Generate descriptions from images and text prompts
Generate audio from text with reference audio