Parakeet-TDT-0.6B-v3 Fine-Tuned on ATC-ASR Dataset

Overview

This repository contains a fine-tuned version of NVIDIA Parakeet-TDT-0.6B-v3, optimized for automatic speech recognition (ASR) in air traffic control (ATC) communications.
The model was fine-tuned using NVIDIA NeMo on the Jacktol ATC-ASR Dataset to improve recognition accuracy in noisy, domain-specific ATC environments.
Following fine-tuning, the model achieves a state-of-the-art word error rate (WER) of 0.0599 on the dataset’s official test split.


Results

Metric Value
Validation Word Error Rate (WER) 0.0558
Test Word Error Rate (WER) 0.0599
Training Time < 1 hour on NVIDIA H200
Framework NVIDIA NeMo
Checkpoint Size 2.34 GB

Model Details

Attribute Description
Base Model nvidia/parakeet-tdt-0.6b-v3
Dataset jacktol/ATC-ASR-Dataset
Framework NVIDIA NeMo
Epochs 16
Batch Size 16
Learning Rate 1e-4
Optimizer AdamW (weight decay 1e-3)
Scheduler CosineAnnealing
Warmup Steps 5000
Min LR 5e-6
Precision Mixed precision (FP16)
Tokenizer Parakeet default subword tokenizer

Dataset

  • Name: Jacktol ATC-ASR Dataset
  • Domain: Air Traffic Control communications
  • Language: English
  • Sampling Rate: 16 kHz
  • Format: WAV + JSON transcripts

Citation

If you use this model, please cite both the base model and dataset authors:

@misc{nvidia2024parakeet,
  title={Parakeet-TDT-0.6B-v3},
  author={NVIDIA},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}}
}

@dataset{jacktol_atc_asr,
  title={ATC-ASR Dataset},
  author={Jacktol},
  year={2023},
  howpublished={\url{https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset}}
}
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for qenneth/parakeet-tdt-0.6b-v3-finetuned-for-ATC

Finetuned
(2)
this model

Dataset used to train qenneth/parakeet-tdt-0.6b-v3-finetuned-for-ATC