--- license: mit base_model: - moxin-org/Moxin-7B-VLM pipeline_tag: image-text-to-text --- # Moxin VLM hf convert version of [Moxin-VLM](https://huggingface.co/moxin-org/Moxin-7B-VLM) based on [openvla](https://github.com/openvla/openvla) Github Page [Moxin-VLM](https://github.com/moxin-org/Moxin-VLM) --- ## Installation ```bash git clone https://github.com/moxin-org/Moxin-VLM.git cd Moxin-VLM conda create -n moxin-vlm python=3.10 -y conda activate moxin-vlm pip install torch==2.4.1 torchvision==0.19.1 pip install transformers==4.46.0 peft==0.15.2 pip install -e . # Install Flash Attention 2 # =>> If you run into difficulty, try `pip cache remove flash_attn` first pip install flash-attn==2.6.3 --no-build-isolation ``` ## Pretrained Models Please find our Pretrained Models on our huggingface page: [moxin-org/Moxin-7B-VLM](https://huggingface.co/moxin-org/Moxin-7B-VLM). We've also provided a hf_convert version [Moxin-7B-VLM-hf](https://huggingface.co/bobchenyx/Moxin-7B-VLM-hf) based on [openvla](https://github.com/openvla/openvla). Please refer to the attached scripts for downloading and running our model locally. ```bash python scripts/snapshot_download.py ``` ## Usage For a complete terminal-based CLI for interacting with our VLMs. ```bash python scripts/generate.py --model_path moxin-org/Moxin-7B-VLM ``` For a faster loading, inference and demo. ```bash python scripts/fast_inference.py ``` --- ## Acknowledgments This project is based on [Prismatic VLMs](https://github.com/TRI-ML/prismatic-vlms) by [TRI-ML](https://github.com/TRI-ML). Special thanks to the original contributors for their excellent work. ## Citation If you find our code or models useful in your work, please cite [our paper](https://arxiv.org/abs/2412.06845v5): ```bibtex @article{zhao2024fully, title={Fully Open Source Moxin-7B Technical Report}, author={Zhao, Pu and Shen, Xuan and Kong, Zhenglun and Shen, Yixin and Chang, Sung-En and Rupprecht, Timothy and Lu, Lei and Nan, Enfu and Yang, Changdi and He, Yumei and others}, journal={arXiv preprint arXiv:2412.06845}, year={2024} }