YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Parakeet-TDT-0.6B v3 (ANE)

Model Description

parakeet-tdt-0.6b-v3 is a 600M-parameter multilingual automatic speech recognition (ASR) model from NVIDIA. It extends parakeet-tdt-0.6b-v2 by moving beyond English-only to support 25 European languages with automatic language detection.

The model was primarily trained on the Granary multilingual corpus and is optimized for both research exploration and production deployment.

This build is integrated with nexaSDK and optimized for modern NPUs, including Apple’s Neural Engine (ANE), for efficient on-device inference.

Features

Multilingual ASR: 25 European languages with built-in language detection.
Text formatting: Outputs text with punctuation and capitalization.
Timestamps: Provides both word-level and segment-level timestamps.
Long audio transcription:
- Up to 24 minutes with full attention (A100 80GB).
- Up to 3 hours with local attention.
Optimized for NPUs: Runs efficiently on Apple ANE, Qualcomm Hexagon, and other dedicated accelerators.
Commercial-friendly: Released under CC-BY-4.0 license.

Apple Neural Engine (ANE)

The Apple Neural Engine (ANE) is a specialized NPU in Apple silicon designed to accelerate AI and ML workloads [3]. By offloading heavy ASR computations to the ANE, parakeet-tdt-0.6b-v3 achieves:

Lower latency speech transcription on iPhone, iPad, and Mac.
Energy-efficient inference, extending battery life during real-time ASR tasks.
On-device privacy, keeping voice data local while maintaining production-grade accuracy.

Supported Languages

Bulgarian (bg), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hungarian (hu), Italian (it), Latvian (lv), Lithuanian (lt), Maltese (mt), Polish (pl), Portuguese (pt), Romanian (ro), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Russian (ru), Ukrainian (uk)

Use Cases

Conversational AI and multilingual chatbots
Voice assistants and smart devices
Real-time transcription services
Subtitles and caption generation
Voice analytics platforms
Research in speech technology

Inputs and Outputs

Input

Type: 16kHz audio
Formats: .wav, .mp3
Shape: 1D mono audio

Output

Type: Text string
Properties: Punctuation + capitalization included

Limitations & Responsible Use

The model may produce transcription errors, particularly with code-switching or noisy input. Evaluate thoroughly before deploying in sensitive domains (e.g., healthcare, finance, or legal).

License

Licensed under the original Parakeet license terms.
See: Parakeet Model License

References

Support

For Nexa SDK: sdk.nexa.ai

Downloads last month: 14

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NexaAI/parakeet-tdt-0.6b-v3-ane

Apple Neural Engine

Collection

Latest SOTA models supported on Apple Neural Engine • 1 item • Updated Sep 24 • 1