You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

AST Fine-tuned for Non-Speech Sound Classification

This model is a fine-tuned version of MIT/ast-finetuned-audioset-10-10-0.4593 on the Nonspeech7k dataset.

Model Details

  • Base Model: MIT/ast-finetuned-audioset-10-10-0.4593
  • Fine-tuned on: Nonspeech7k dataset
  • Classes: breath, cough, crying, laugh, screaming, sneeze, yawn
  • Sample Rate: 16kHz
  • Input Length: 10 seconds (160,000 samples)

Usage

from transformers import ASTFeatureExtractor, ASTForAudioClassification
import torch
import torchaudio

# Load model
feature_extractor = ASTFeatureExtractor.from_pretrained("FizzyBrain/ast-nonspeech7k-finetuned")
model = ASTForAudioClassification.from_pretrained("FizzyBrain/ast-nonspeech7k-finetuned")

# Load and preprocess audio
waveform, sample_rate = torchaudio.load("audio.wav")
inputs = feature_extractor(waveform, sampling_rate=16000, return_tensors="pt")

# Predict
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class_id = predictions.argmax().item()

Classes

  1. breath
  2. cough
  3. crying
  4. laugh
  5. screaming
  6. sneeze
  7. yawn

Training Details

  • Fine-tuned using advanced augmentation techniques
  • Class-weighted loss for imbalanced data
  • Layer-wise learning rate decay
  • Early stopping with macro-F1 monitoring
Downloads last month
-
Safetensors
Model size
86.2M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for FizzyBrain/ast-nonspeech7k-finetuned

Finetuned
(143)
this model