Improve model card: Add Github link and more tags

This PR adds an explicit link to the GitHub repository in the model card content for easier navigation. It also enhances the model's metadata by adding relevant tags (`llava`, `reasoning`, `vqa`) to better describe its functionality and improve discoverability on the Hugging Face Hub.

Files changed (1) hide show

README.md +12 -5

README.md CHANGED Viewed

@@ -1,14 +1,19 @@
 ---
-license: apache-2.0
-language:
-- en
 base_model:
 - meta-llama/Llama-3.2-11B-Vision-Instruct
 datasets:
 - Xkev/LLaVA-CoT-100k
-pipeline_tag: image-text-to-text
 library_name: transformers
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
@@ -24,6 +29,8 @@ The model was proposed in [LLaVA-CoT: Let Vision Language Models Reason Step-by-
 - **License:** apache-2.0
 - **Finetuned from model:** meta-llama/Llama-3.2-11B-Vision-Instruct
 ## Benchmark Results
 | MMStar | MMBench | MMVet | MathVista | AI2D | Hallusion | Average |
@@ -95,5 +102,5 @@ Using the same setting should accurately reproduce our results.
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
-The model may generate biased or offensive content, similar to other VLMs, due to limitations in the training data.
 Technically, the model's performance in aspects like instruction following still falls short of leading industry models.

 ---
 base_model:
 - meta-llama/Llama-3.2-11B-Vision-Instruct
 datasets:
 - Xkev/LLaVA-CoT-100k
+language:
+- en
 library_name: transformers
+license: apache-2.0
+pipeline_tag: image-text-to-text
+tags:
+- llava
+- reasoning
+- vqa
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
 - **License:** apache-2.0
 - **Finetuned from model:** meta-llama/Llama-3.2-11B-Vision-Instruct
+**Code:** [https://github.com/PKU-YuanGroup/LLaVA-CoT](https://github.com/PKU-YuanGroup/LLaVA-CoT)
 ## Benchmark Results
 | MMStar | MMBench | MMVet | MathVista | AI2D | Hallusion | Average |
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
+The model may generate biased or offensive content, similar to other VLMs, due to limitations in the training data.
 Technically, the model's performance in aspects like instruction following still falls short of leading industry models.