nkkbr
/

ViCA2-stage1-align

Video-Text-to-Text

text-generation

vision-language

video understanding

visuospatial cognition

spatial reasoning

Model card Files Files and versions

nkkbr commited on May 15

Commit

653b4ae

·

verified ·

1 Parent(s): b1ee778

Create README.md

Files changed (1) hide show

README.md +32 -0

README.md ADDED Viewed

	@@ -0,0 +1,32 @@

+---
+license: apache-2.0
+tags:
+  - multimodal
+  - vision-language
+  - video understanding
+  - visuospatial cognition
+  - spatial reasoning
+  - vlm
+  - llava
+  - qwen
+  - siglip
+  - hiera
+  - sam2
+  - dual-encoder
+datasets:
+  - liuhaotian/LLaVA-CC3M-Pretrain-595K
+language:
+  - en
+library_name: transformers
+pipeline_tag: video-text-to-text
+model_name: ViCA2-7B-Stage1
+---
+## Usage and Full Documentation
+For detailed model description, training setup, datasets, evaluation results, and inference code, **please refer to the following links**:
+[![GitHub](https://img.shields.io/badge/GitHub-ViCA2-181717?logo=github&logoColor=white)](https://github.com/nkkbr/ViCA)
+[![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ViCA2-blue)](https://huggingface.co/nkkbr/ViCA2)