Tabular Regression
Transformers
Safetensors
Bambara
reward-model

Description

This model is a Reward Model trained on the RobotsMali transcription scorer dataset, where the scores were assigned by human annotators.
It predicts a continuous score between 0 and 1 for a pair (audio, text), representing how well the text matches the spoken audio.

The model can be integrated as a Reward Model within RLHF pipelines to evaluate or fine-tune ASR models based on human preference scores.


Model Overview

The model consists of two main encoders β€” one for audio and one for text β€” followed by a small regression head that outputs a scalar score.


Audio Encoder

Input: Raw waveform (16 kHz)
Feature extraction: Mel-spectrogram computed from waveform using WhisperFeatureExtractor

Parameters:

  • n_fft: 1024
  • n_mels: 80
  • hop_length: 256
  • sample_rate: 16000

Architecture:

  • 3 Γ— (Conv1d β†’ BatchNorm1d β†’ ReLU).
  • Kernel size: 5, stride: 1, padding: 2.
  • Channel size: 128.

Text Encoder

Input: Tokenized transcription (IDs from SentencePiece tokenizer)
Architecture:

  • Embedding layer: dim = 128, vocab_size = 2048
  • Bidirectional LSTM: hidden size = 128, 1 layer

Fusion & Regression Head

Fusion: Concatenate [audio_emb, text_emb]

Regression head:

  • Linear(384 β†’ 256) β†’ ReLU β†’ Dropout(0.3)
  • Linear(256 β†’ 256) β†’ ReLU
  • Linear(256 β†’ 1) β†’ Sigmoid

Output: Scalar ∈ [0, 1] (reward score)


Objective

  • Loss: Mean Squared Error (MSE)
  • Goal: Predict the similarity score between the spoken audio and its transcription.

Example Usage

First, install our package

pip install git+https://github.com/diarray-hub/bambara-asr.git@rlnf-v2-gpu
import torch
from RLNF.Rewards.reward_config import RewardConfig
from RLNF.Rewards.reward_model import RewardModel
from RLNF.Rewards.reward_processor import RewardModelProcessor

audios = ["1.wav", "2.wav"]
texts = ["kelen", "fila."]

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

processor : RewardModelProcessor = RewardModelProcessor.from_pretrained("RobotsMali/reward-model")
model : RewardModel = RewardModel.from_pretrained("RobotsMali/reward-model")

model.eval()
model.to(device)
    
out = processor(audios=audios, texts=texts)    
out = {k: v.to(device) if torch.is_tensor(v) else v for k, v in out.items()}

with torch.no_grad() :
  preds = model(**out).logits
    
    
for i, (t, val) in enumerate(zip(texts, preds)):
  print(f"Audio : {audios[i]:<10} | Text: {t:<10} | Score: {val.item() * 100:.4f}")

Evaluation Results

Metric Value
MSE 0.07672813534736633
RΒ² 0.42677074670791626
Pearson 0.6603442430496216
Accuracy 0.33
Downloads last month
98
Safetensors
Model size
908k params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train RobotsMali/reward-model