Moxin VLM

hf convert version of Moxin-VLM based on openvla

Installation

git clone https://github.com/moxin-org/Moxin-VLM.git
cd Moxin-VLM

conda create -n moxin-vlm python=3.10 -y
conda activate moxin-vlm

pip install torch==2.4.1 torchvision==0.19.1
pip install transformers==4.46.0 peft==0.15.2

pip install -e .

# Install Flash Attention 2 
#   =>> If you run into difficulty, try `pip cache remove flash_attn` first
pip install flash-attn==2.6.3 --no-build-isolation

Pretrained Models

Please find our Pretrained Models on our huggingface page: moxin-org/Moxin-7B-VLM.

We've also provided a hf_convert version Moxin-7B-VLM-hf based on openvla.

Please refer to the attached scripts for downloading and running our model locally.

python scripts/snapshot_download.py

Usage

For a complete terminal-based CLI for interacting with our VLMs.

python scripts/generate.py --model_path moxin-org/Moxin-7B-VLM

For a faster loading, inference and demo.

python scripts/fast_inference.py

Acknowledgments

This project is based on Prismatic VLMs by TRI-ML.

Special thanks to the original contributors for their excellent work.

Citation

If you find our code or models useful in your work, please cite our paper:

@article{zhao2024fully,
  title={Fully Open Source Moxin-7B Technical Report},
  author={Zhao, Pu and Shen, Xuan and Kong, Zhenglun and Shen, Yixin and Chang, Sung-En and Rupprecht, Timothy and Lu, Lei and Nan, Enfu and Yang, Changdi and He, Yumei and others},
  journal={arXiv preprint arXiv:2412.06845},
  year={2024}
}

Downloads last month: 4

Safetensors

Model size

9B params

Tensor type

BF16

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bobchenyx/Moxin-7B-VLM-hf

Base model

moxin-org/Moxin-7B-VLM

Finetuned

(1)

this model