---
language:
- en
tags:
- vision
- vlm
- ui-automation
- computer-vision
- nemotron
- internvl
datasets:
- custom
widget:
- text: "Click on the Submit button"
---

# Model Card for Nemotron Plus VLM

This model card documents the Nemotron Plus VLM for UI automation tasks.

## Model Details

### Model Description

Nemotron Plus VLM is a vision-language model specifically designed for UI automation and self-operating computer tasks. It combines the Nemotron language model with InternVL3 vision capabilities.

### Model Sources

- **Repository:** This repository
- **Paper:** [Coming soon]

## Uses

### Direct Use

The model can be used for:
- UI automation tasks
- Computer control via natural language
- Visual understanding of user interfaces
- Action planning and execution

### Out-of-Scope Use

Not intended for:
- General image captioning
- Non-UI visual tasks
- Malicious automation

## Training Details

### Training Data

Custom datasets for UI automation including:
- File operation commands
- Web interaction scenarios
- Window management tasks
- System automation commands

### Training Procedure

Fine-tuned using hierarchical reasoning approach with prompt-completion pairs.

## Evaluation

[Evaluation metrics to be added]

## Environmental Impact

Training conducted with efficiency optimizations to minimize compute requirements.

## Technical Specifications

See README for detailed specifications.