ast-finetuned-audioset-10-10-0.4593-finetuned-gtzan-optimized
This model is a fine-tuned version of MIT/ast-finetuned-audioset-10-10-0.4593 on the GTZAN dataset. It achieves the following results on the evaluation set:
- Loss: 0.8353
- Accuracy: 0.875
- Precision: 0.8852
- Recall: 0.875
- F1: 0.8757
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: polynomial
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 30
- label_smoothing_factor: 0.1
Training results
| Training Loss | Epoch | Step | Accuracy | F1 | Validation Loss | Precision | Recall |
|---|---|---|---|---|---|---|---|
| 9.9362 | 1.0 | 22 | 0.1333 | 0.0927 | 2.4114 | 0.0742 | 0.1333 |
| 8.8573 | 2.0 | 44 | 0.3167 | 0.3088 | 2.0453 | 0.3933 | 0.3167 |
| 7.2272 | 3.0 | 66 | 0.6722 | 0.6791 | 1.5591 | 0.6966 | 0.6722 |
| 5.3183 | 4.0 | 88 | 0.7556 | 0.7557 | 1.1642 | 0.7839 | 0.7556 |
| 3.527 | 5.0 | 110 | 0.8111 | 0.8108 | 0.9838 | 0.8235 | 0.8111 |
| 3.074 | 6.0 | 132 | 0.7778 | 0.7820 | 0.9769 | 0.8054 | 0.7778 |
| 2.7507 | 7.0 | 154 | 0.8389 | 0.8384 | 0.8948 | 0.8423 | 0.8389 |
| 2.4184 | 8.0 | 176 | 0.8222 | 0.8230 | 0.9135 | 0.8423 | 0.8222 |
| 2.2014 | 9.0 | 198 | 0.8333 | 0.8351 | 0.8896 | 0.8448 | 0.8333 |
| 2.093 | 10.0 | 220 | 0.8333 | 0.8344 | 0.8813 | 0.8388 | 0.8333 |
| 2.0531 | 11.0 | 242 | 0.8333 | 0.8342 | 0.8742 | 0.8415 | 0.8333 |
| 2.0729 | 12.0 | 264 | 0.8222 | 0.8224 | 0.8972 | 0.8335 | 0.8222 |
| 2.3772 | 13.0 | 286 | 0.8980 | 0.8278 | 0.8375 | 0.8278 | 0.8290 |
| 2.1131 | 14.0 | 308 | 0.8904 | 0.8278 | 0.8408 | 0.8278 | 0.8293 |
| 2.0553 | 15.0 | 330 | 0.8555 | 0.8389 | 0.8546 | 0.8389 | 0.8428 |
| 2.0367 | 16.0 | 352 | 0.8485 | 0.8444 | 0.8533 | 0.8444 | 0.8469 |
| 2.0353 | 17.0 | 374 | 0.8616 | 0.8444 | 0.8509 | 0.8444 | 0.8466 |
| 2.0233 | 18.0 | 396 | 0.8422 | 0.8556 | 0.8600 | 0.8556 | 0.8570 |
| 2.0208 | 19.0 | 418 | 0.8436 | 0.85 | 0.8599 | 0.85 | 0.8524 |
| 2.0145 | 20.0 | 440 | 0.8409 | 0.8556 | 0.8658 | 0.8556 | 0.8587 |
| 2.0121 | 21.0 | 462 | 0.8345 | 0.8667 | 0.8759 | 0.8667 | 0.8694 |
| 2.0096 | 22.0 | 484 | 0.8434 | 0.8722 | 0.8861 | 0.8722 | 0.8753 |
| 2.0093 | 23.0 | 506 | 0.8341 | 0.8444 | 0.8498 | 0.8444 | 0.8460 |
| 2.0058 | 24.0 | 528 | 0.8301 | 0.8444 | 0.8515 | 0.8444 | 0.8465 |
| 2.0053 | 25.0 | 550 | 0.8269 | 0.8556 | 0.8636 | 0.8556 | 0.8575 |
| 2.0037 | 26.0 | 572 | 0.8362 | 0.8556 | 0.8630 | 0.8556 | 0.8577 |
| 2.0026 | 27.0 | 594 | 0.8334 | 0.8611 | 0.8661 | 0.8611 | 0.8628 |
Framework versions
- Transformers 4.57.0.dev0
- Pytorch 2.9.0.dev20250716+cu129
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 1
Model tree for zikangzheng/ast-finetuned-audioset-10-10-0.4593-gtzan-optimized
Base model
MIT/ast-finetuned-audioset-10-10-0.4593Dataset used to train zikangzheng/ast-finetuned-audioset-10-10-0.4593-gtzan-optimized
Evaluation results
- Accuracy on GTZANself-reported0.875
- Precision on GTZANself-reported0.885
- Recall on GTZANself-reported0.875
- F1 on GTZANself-reported0.876