Papers
arxiv:2510.04162

Drax: Speech Recognition with Discrete Flow Matching

Published on Oct 5
· Submitted by Aviv Navon on Oct 8
· aiola-lab aiOla
Authors:
,
,
,
,
,
,

Abstract

Drax, a discrete flow matching framework for ASR, achieves state-of-the-art recognition accuracy with improved efficiency by constructing an audio-conditioned probability path.

AI-generated summary

Diffusion and flow-based non-autoregressive (NAR) models have shown strong promise in large language modeling, however, their potential for automatic speech recognition (ASR) remains largely unexplored. We propose Drax, a discrete flow matching framework for ASR that enables efficient parallel decoding. To better align training with inference, we construct an audio-conditioned probability path that guides the model through trajectories resembling likely intermediate inference errors, rather than direct random noise to target transitions. Our theoretical analysis links the generalization gap to divergences between training and inference occupancies, controlled by cumulative velocity errors, thereby motivating our design choice. Empirical evaluation demonstrates that our approach attains recognition accuracy on par with state-of-the-art speech models while offering improved accuracy-efficiency trade-offs, highlighting discrete flow matching as a promising direction for advancing NAR ASR.

Community

💻 Code | 🤗 Models

Paper submitter

We propose Drax, a non-autoregressive ASR model using discrete flow matching that includes an audio-conditioned intermediate distribution to better match inference dynamics.
Drax achieves accuracy comparable to state-of-the-art autoregressive models while offering better control over the accuracy-efficiency trade-off point.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.04162 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.04162 in a Space README.md to link it from this page.

Collections including this paper 2