You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Description

This model is a Reward Model trained on the RobotsMali transcription scorer dataset, where the scores were assigned by human annotators.
It predicts a continuous score between 0 and 1 for a pair (audio, text), representing how well the text matches the spoken audio.

The model can be integrated as a Reward Model within RLHF pipelines to evaluate or fine-tune ASR models based on human preference scores.


Model Overview

The model consists of two main encoders β€” one for audio and one for text β€” followed by a small regression head that outputs a scalar score.


Audio Encoder

Input: Raw waveform (16 kHz)
Feature extraction: Mel-spectrogram computed from waveform using WhisperFeatureExtractor

Parameters:

  • n_fft: 1024
  • n_mels: 80
  • hop_length: 256
  • sample_rate: 16000

Architecture:

  • 3 Γ— (Conv1d β†’ BatchNorm1d β†’ ReLU).
  • Kernel size: 5, stride: 1, padding: 2.
  • Channel size: 128.

Text Encoder

Input: Tokenized transcription (IDs from SentencePiece tokenizer)
Architecture:

  • Embedding layer: dim = 128, vocab_size = 2048
  • Bidirectional LSTM: hidden size = 128, 1 layer
  • Output: mean pooling over valid tokens

Fusion & Regression Head

Fusion: Concatenate [audio_emb, text_emb]

Regression head:

  • Linear(384 β†’ 256) β†’ ReLU β†’ Dropout(0.3)
  • Linear(256 β†’ 256) β†’ ReLU
  • Linear(256 β†’ 1) β†’ Sigmoid

Output: Scalar ∈ [0, 1] (reward score)


Objective

  • Loss: Mean Squared Error (MSE)
  • Goal: Predict the similarity score between the spoken audio and its transcription.

Example Usage

Comming soon ......

Downloads last month
47
Safetensors
Model size
908k params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Panga-Azazia/reward-model