Improve model card: Update pipeline tag, license, and add comprehensive details from GitHub

This pull request significantly enhances the RedDino model card by:

- **Updating Metadata**:
* Correcting the `pipeline_tag` from `feature-extraction` to `image-feature-extraction` to more accurately reflect the model's functionality and improve discoverability on the Hugging Face Hub.
* Updating the `license` from `cc-by-4.0` to `cc-by-nc-4.0`, as specified in the model's `config.json`, ensuring the stated license is precise for the artifact.

- **Enriching Content**:
* Adding a prominent link to the [official GitHub repository](https://github.com/Snarci/RedDino) for easy access to the source code and additional resources.
* Integrating detailed sections from the GitHub README, including **Model Variants**, **Benchmark Results**, and **Highlights**, providing a much richer overview of RedDino's architecture, performance, and key innovations.
* Updating the main title of the model card to align with the paper's title for better clarity and context.

These updates aim to provide a more complete, accurate, and user-friendly resource for researchers and developers interested in RedDino.

Files changed (1) hide show

README.md +110 -68

README.md CHANGED Viewed

@@ -1,87 +1,76 @@
 ---
-license: cc-by-4.0
-tags:
-  - red-blood-cells
-  - hematology
-  - medical-imaging
-  - vision-transformer
-  - dino
-  - dinov2
-  - feature-extraction
-  - foundation-model
-library_name: timm
 datasets:
-  - Elsafty
-  - Chula
-  - DSE
-pipeline_tag: feature-extraction
 model-index:
-  - name: RedDino-large
-    results:
-      - task:
-          type: image-classification
-          name: RBC Shape Classification
-        dataset:
-          name: Elsafty
-          type: Classification
-        metrics:
-          - type: Weighted F1
-            value: 88.5
-          - type: Balanced Accuracy
-            value: 89.1
-          - type: Accuracy
-            value: 88.4
-      - task:
-          type: image-classification
-          name: RBC Shape Classification
-        dataset:
-          name: Chula
-          type: Classification
-        metrics:
-          - type: Weighted F1
-            value: 83.9
-          - type: Balanced Accuracy
-            value: 79.0
-          - type: Accuracy
-            value: 85.0
-      - task:
-          type: image-classification
-          name: RBC Shape Classification
-        dataset:
-          name: DSE
-          type: Classification
-        metrics:
-          - type: Weighted F1
-            value: 86.6
-          - type: Balanced Accuracy
-            value: 60.1
-          - type: Accuracy
-            value: 86.6
 ---
-# RedDino-large
-**RedDino** is a self-supervised Vision Transformer foundation model specifically designed for **red blood cell (RBC)** image analysis.
-It leverages a tailored version of the **DINOv2** framework, trained on a meticulously curated dataset of **1.25 million RBC images** from diverse acquisition modalities and sources.
-This model excels at extracting robust, general-purpose features for downstream hematology tasks such as **shape classification**, **morphological subtype recognition**, and **batch-effect–robust analysis**.
 Unlike general-purpose models pretrained on natural images, RedDino incorporates hematology-specific augmentations, architectural tweaks, and RBC-tailored data preprocessing, enabling **state-of-the-art performance** on multiple RBC benchmarks.
 > 🧠 Developed by [Luca Zedda](https://orcid.org/0009-0001-8488-1612), [Andrea Loddo](https://orcid.org/0000-0002-6571-3816), [Cecilia Di Ruberto](https://orcid.org/0000-0003-4641-0307), and [Carsten Marr](https://orcid.org/0000-0003-2154-4552)
 > 🏥 University of Cagliari & Helmholtz Munich
-> 📄 Preprint: [arXiv:2508.08180](https://arxiv.org/abs/2508.08180)
 ---
 ## Model Details
-- **Architecture:** ViT-large, patch size 14
-- **SSL framework:** DINOv2 (customized for RBC morphology)
-- **Pretraining dataset:** Curated RBC images from 18 datasets (multiple modalities and sources)
-- **Embedding size:** 1024
-- **Intended use:** RBC morphology classification, feature extraction, batch-effect–robust analysis
 Notes:
-- RBC-specific training strategy including removal of KoLeo regularizer and Sinkhorn-Knopp centering.
-- Training on smear patches (not only single cells) to enhance cross-source generalization.
 ## Example Usage
 ```python
 from PIL import Image
@@ -106,6 +95,53 @@ input_tensor = transform(image).unsqueeze(0).to(device)
 with torch.no_grad():
     embedding = model(input_tensor)
 ```
 ## 📝 Citation
 If you use this model, please cite the following paper:
@@ -125,3 +161,9 @@ Preprint: arXiv:2508.08180. https://arxiv.org/abs/2508.08180
       url={https://arxiv.org/abs/2508.08180},
 }
 ```

 ---
 datasets:
+- Elsafty
+- Chula
+- DSE
+library_name: timm
+license: cc-by-nc-4.0
+pipeline_tag: image-feature-extraction
+tags:
+- red-blood-cells
+- hematology
+- medical-imaging
+- vision-transformer
+- dino
+- dinov2
+- feature-extraction
+- foundation-model
 model-index:
+- name: RedDino-large
+  results:
+  - task:
+      type: image-classification
+      name: RBC Shape Classification
+    dataset:
+      name: Elsafty
+      type: Classification
+    metrics:
+    - type: Weighted F1
+      value: 88.5
+    - type: Balanced Accuracy
+      value: 89.1
+    - type: Accuracy
+      value: 88.4
+    - type: Weighted F1
+      value: 83.9
+    - type: Balanced Accuracy
+      value: 79.0
+    - type: Accuracy
+      value: 85.0
+    - type: Weighted F1
+      value: 86.6
+    - type: Balanced Accuracy
+      value: 60.1
+    - type: Accuracy
+      value: 86.6
 ---
+# RedDino: A Foundation Model for Red Blood Cell Analysis
+**RedDino** is a self-supervised Vision Transformer foundation model specifically designed for **red blood cell (RBC)** image analysis, as presented in the paper [RedDino: A foundation model for red blood cell analysis](https://arxiv.org/abs/2508.08180).
+It leverages a tailored version of the **DINOv2** framework, trained on a meticulously curated dataset of **1.25 million RBC images** from diverse acquisition modalities and sources. This model excels at extracting robust, general-purpose features for downstream hematology tasks such as **shape classification**, **morphological subtype recognition**, and **batch-effect–robust analysis**.
 Unlike general-purpose models pretrained on natural images, RedDino incorporates hematology-specific augmentations, architectural tweaks, and RBC-tailored data preprocessing, enabling **state-of-the-art performance** on multiple RBC benchmarks.
 > 🧠 Developed by [Luca Zedda](https://orcid.org/0009-0001-8488-1612), [Andrea Loddo](https://orcid.org/0000-0002-6571-3816), [Cecilia Di Ruberto](https://orcid.org/0000-0003-4641-0307), and [Carsten Marr](https://orcid.org/0000-0003-2154-4552)
 > 🏥 University of Cagliari & Helmholtz Munich
+> 📄 Preprint: [arXiv:2508.08180](https://arxiv.org/abs/2508.08180)
+> 💻 Code: [https://github.com/Snarci/RedDino](https://github.com/Snarci/RedDino)
 ---
 ## Model Details
+-   **Architecture:** ViT-large, patch size 14
+-   **SSL framework:** DINOv2 (customized for RBC morphology)
+-   **Pretraining dataset:** Curated RBC images from 18 datasets (multiple modalities and sources)
+-   **Embedding size:** 1024
+-   **Intended use:** RBC morphology classification, feature extraction, batch-effect–robust analysis
 Notes:
+-   RBC-specific training strategy including removal of KoLeo regularizer and Sinkhorn-Knopp centering.
+-   Training on smear patches (not only single cells) to enhance cross-source generalization.
 ## Example Usage
 ```python
 from PIL import Image
 with torch.no_grad():
     embedding = model(input_tensor)
 ```
+## Model Variants
+RedDino comes in three sizes to suit different computational requirements and performance needs:
+| Model Variant | Embedding Size | Parameters | Usage |
+|---------------|----------------|------------|--------|
+| **RedDino-small** | 384 | 22M | `timm.create_model("hf_hub:Snarcy/RedDino-small", pretrained=True)` |
+| **RedDino-base** | 768 | 86M | `timm.create_model("hf_hub:Snarcy/RedDino-base", pretrained=True)` |
+| **RedDino-large** | 1024 | 304M | `timm.create_model("hf_hub:Snarcy/RedDino-large", pretrained=True)` |
+Choose the variant that best fits your computational budget and performance requirements. Larger models generally provide richer feature representations at the cost of increased computational overhead.
+---
+## Benchmark Results
+RedDino was benchmarked on major RBC classification datasets—including Elsafty, Chula, and DSE—outperforming state-of-the-art baselines such as ResNet50, DinoBloom, and DINOv2.
+| Model             | Dataset   | Metric      | Linear Probing (wF1) | 1-NN (wF1) | 20-NN (wF1) |
+|-------------------|-----------|-------------|----------------------|------------|-------------|
+| ResNet50          | Elsafty   | Weighted F1 | 77.6 ± 8.1           | 64.3 ± 4.8 | 66.2 ± 4.9  |
+| DinoBloom-S       | Elsafty   | Weighted F1 | 83.2 ± 8.2           | 73.1 ± 5.1 | 76.5 ± 4.2  |
+| DINOv2 (small)    | Elsafty   | Weighted F1 | 82.1 ± 8.2           | 73.5 ± 4.8 | 77.2 ± 4.6  |
+| RedDino small     | Elsafty   | Weighted F1 | 86.0 ± 7.0           | 76.8 ± 4.9 | 80.0 ± 4.5  |
+| RedDino base      | Elsafty   | Weighted F1 | 88.1 ± 4.9           | 78.8 ± 3.6 | 82.6 ± 2.8  |
+| RedDino large     | Elsafty   | Weighted F1 | 88.5 ± 5.5           | 78.5 ± 4.6 | 81.6 ± 4.7  |
+On Chula and DSE datasets, RedDino consistently surpassed all other models in feature quality (linear probing) with average improvements of 2–4% over prior approaches in key metrics.
+---
+## Highlights
+-   **Foundation model** for RBC analysis trained on the largest available multi-source RBC image set: 1.25M+ images, using advanced CellPose-based instance segmentation and patch extraction.
+-   **DINOv2-based self-supervised learning** for label-efficient pretraining and robust, transferable features.
+-   **Model architecture and key innovations**:
+    -   Patch-based training (224×224 px) shown to outperform single-cell training.
+    -   Novel data augmentation via Albumentations (32 pixel-level strategies).
+    -   Removal of the Koleo regularizer and adoption of Sinkhorn-Knopp centering for improved representation in RBC-specific domains.
+    -   Suite of models (small, base, large) covering 22M–304M parameters.
+-   **Generalization**: Strong adaptation across varied protocols, microscopes, and imaging sites. Demonstrated resistance to batch effects and out-of-domain variance.
+-   **Interpretability tools**: PCA/UMAP visualizations reveal clustering by phenotype and batch, distinguishing abnormal cells (e.g., malaria, echinocytes).
+-   **Easy deployment**: Models and code are available on [GitHub](https://github.com/Snarci/RedDino) and [Hugging Face](https://huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc).
+---
 ## 📝 Citation
 If you use this model, please cite the following paper:
       url={https://arxiv.org/abs/2508.08180},
 }
 ```
+---
+## Summary
+RedDino is the first family of foundation models tailored for comprehensive red blood cell image analysis, using large-scale self-supervised learning to set new performance benchmarks and generalization standards for computational hematology. Models and pretrained weights are available for research and practical deployment.