Parakeet-TDT-0.6B-v3 Fine-Tuned on ATC-ASR Dataset
Overview
This repository contains a fine-tuned version of NVIDIA Parakeet-TDT-0.6B-v3, optimized for automatic speech recognition (ASR) in air traffic control (ATC) communications.
The model was fine-tuned using NVIDIA NeMo on the Jacktol ATC-ASR Dataset to improve recognition accuracy in noisy, domain-specific ATC environments.
Following fine-tuning, the model achieves a state-of-the-art word error rate (WER) of 0.0599 on the dataset’s official test split.
Results
| Metric | Value |
|---|---|
| Validation Word Error Rate (WER) | 0.0558 |
| Test Word Error Rate (WER) | 0.0599 |
| Training Time | < 1 hour on NVIDIA H200 |
| Framework | NVIDIA NeMo |
| Checkpoint Size | 2.34 GB |
Model Details
| Attribute | Description |
|---|---|
| Base Model | nvidia/parakeet-tdt-0.6b-v3 |
| Dataset | jacktol/ATC-ASR-Dataset |
| Framework | NVIDIA NeMo |
| Epochs | 16 |
| Batch Size | 16 |
| Learning Rate | 1e-4 |
| Optimizer | AdamW (weight decay 1e-3) |
| Scheduler | CosineAnnealing |
| Warmup Steps | 5000 |
| Min LR | 5e-6 |
| Precision | Mixed precision (FP16) |
| Tokenizer | Parakeet default subword tokenizer |
Dataset
- Name: Jacktol ATC-ASR Dataset
- Domain: Air Traffic Control communications
- Language: English
- Sampling Rate: 16 kHz
- Format: WAV + JSON transcripts
Citation
If you use this model, please cite both the base model and dataset authors:
@misc{nvidia2024parakeet,
title={Parakeet-TDT-0.6B-v3},
author={NVIDIA},
year={2024},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}}
}
@dataset{jacktol_atc_asr,
title={ATC-ASR Dataset},
author={Jacktol},
year={2023},
howpublished={\url{https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset}}
}
- Downloads last month
- 6
Model tree for qenneth/parakeet-tdt-0.6b-v3-finetuned-for-ATC
Base model
nvidia/parakeet-tdt-0.6b-v3