Evaluating the impact of synthetic data on DeBERTa-v3 performance

by aarabil - opened May 21

May 21

From what I understand, the zeroshot-v2.0 models (without the -c suffix) are trained on a combination of all synthetic datasets, NLI-based datasets, and classification datasets. When comparing performance metrics, I see that deberta-v3-large-zeroshot-v2.0 has a mean zero-shot score of 0.673, while the earlier version MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33 ; trained on the same data excluding synthetic datasets; achieves a higher mean zero-shot score of 0.734.

Am I correct in interpreting that the newer model underperforms compared to its predecessor, likely due to the inclusion of synthetic data? And is it fair to say that the main advantage of incorporating synthetic data lies in its open licensing, rather than performance gains?

aarabil

May 22

After a closer look, it appears that balanced accuracy is reported for MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33, while the v2 models use macro F1 on their model cards. Is there a source that provides consistent metric comparisons across all versions?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment