| license: mit | |
| language: | |
| - en | |
| library_name: open_clip | |
| A CLIP (Contrastive Language-Image Pre-training) model finetuned on the LivingThings-10M subset of the EntityNet-33M dataset. The base model is `ViT-B-32/datacomp_xl_s13b_b90k`. | |
| See the [project page](https://github.com/lmb-freiburg/entitynet) for the paper, code, usage examples, metrics, etc. | |
| The model has seen ~13B images at a batch size of 90k during pretraining, and ~0.2B images at a batch size of 32k during finetuning. | |