--- language: - en tags: - vision - vlm - ui-automation - computer-vision - nemotron - internvl datasets: - custom widget: - text: "Click on the Submit button" --- # Model Card for Nemotron Plus VLM This model card documents the Nemotron Plus VLM for UI automation tasks. ## Model Details ### Model Description Nemotron Plus VLM is a vision-language model specifically designed for UI automation and self-operating computer tasks. It combines the Nemotron language model with InternVL3 vision capabilities. ### Model Sources - **Repository:** This repository - **Paper:** [Coming soon] ## Uses ### Direct Use The model can be used for: - UI automation tasks - Computer control via natural language - Visual understanding of user interfaces - Action planning and execution ### Out-of-Scope Use Not intended for: - General image captioning - Non-UI visual tasks - Malicious automation ## Training Details ### Training Data Custom datasets for UI automation including: - File operation commands - Web interaction scenarios - Window management tasks - System automation commands ### Training Procedure Fine-tuned using hierarchical reasoning approach with prompt-completion pairs. ## Evaluation [Evaluation metrics to be added] ## Environmental Impact Training conducted with efficiency optimizations to minimize compute requirements. ## Technical Specifications See README for detailed specifications.