Evaluating the impact of synthetic data on DeBERTa-v3 performance
From what I understand, the zeroshot-v2.0 models (without the -c suffix) are trained on a combination of all synthetic datasets, NLI-based datasets, and classification datasets. When comparing performance metrics, I see that deberta-v3-large-zeroshot-v2.0 has a mean zero-shot score of 0.673, while the earlier version MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33 ; trained on the same data excluding synthetic datasets; achieves a higher mean zero-shot score of 0.734.
Am I correct in interpreting that the newer model underperforms compared to its predecessor, likely due to the inclusion of synthetic data? And is it fair to say that the main advantage of incorporating synthetic data lies in its open licensing, rather than performance gains?
After a closer look, it appears that balanced accuracy is reported for MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33, while the v2 models use macro F1 on their model cards. Is there a source that provides consistent metric comparisons across all versions?