Parse and recognize text in images
Extract text from images in multiple languages
Gemini 2.0 native image generation co-doodling
Generate answers by combining text and images
Generate text based on images and videos
Remove background from images
Generate realistic voice from text and audio sample