michelecafagna26
/

clipcap-base-captioning-ft-hl-narratives

image-captioning

Model card Files Files and versions

michelecafagna26 commited on Jul 24, 2023

Commit

741e096

·

1 Parent(s): aacc39d

Upload README.md

Files changed (1) hide show

README.md +101 -0

README.md CHANGED Viewed

@@ -1,3 +1,104 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+tags:
+- image-captioning
+languages:
+- en
+datasets:
+- michelecafagna26/hl-narratives
+language:
+- en
+metrics:
+- sacrebleu
+- rouge
+library_name: transformers
 ---
+## ClipCap fine-tuned for Narrative Image Captioning
+[ClipCap](https://arxiv.org/abs/2111.09734) base trained on the [HL Narratives](https://huggingface.co/datasets/michelecafagna26/hl-narratives) for **high-level narrative descriptions generation**
+## Model fine-tuning 🏋️‍
+We fine-tune LM + Mapping Network starting grom the model pretrained on COCO
+- Trained for a 3 epochs
+- lr:  5e−5
+- Adam optimizer
+- half-precision (fp16)
+## Test set metrics 🧾
+    | Cider  | SacreBLEU  | Rouge-L|
+    |--------|------------|--------|
+    | 63.91  |   8.15     |  24.53 |
+## Demo
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1xcaJOxaAp8TRd8a6x1XnAptVjHQRv3Zj?usp=sharing)
+## Installation
+```bash
+pip install git+https://github.com/michelecafagna26/CLIPCap.git
+```
+## Download the model
+```bash
+git lfs install # if not installed
+git clone https://huggingface.co/michelecafagna26/clipcap-base-captioning-ft-hl-narratives
+```
+## Model in Action 🚀
+```python
+from clipcap import ClipCaptionModel
+import torch
+from transformers import (
+    GPT2Tokenizer,
+    GPT2LMHeadModel,
+)
+import torch
+import clip
+import requests
+from PIL import Image
+model_path = "clipcap-base-captioning-ft-hl-narratives/pytorch_model.pt" # change accordingly
+# load clip
+device = "cuda" if torch.cuda.is_available() else "cpu"
+clip_model, preprocess = clip.load("ViT-B/32", device=device, jit=False)
+tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+prefix_length = 10
+# load ClipCap
+model = ClipCaptionModel(prefix_length, tokenizer=tokenizer)
+model.from_pretrained(model_path)
+model = model.eval()
+model = model.to(device)
+# load the image
+img_url = 'https://datasets-server.huggingface.co/assets/michelecafagna26/hl-narratives/--/default/train/3/image/image.jpg'
+raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
+# extract the prefix
+image = preprocess(raw_image).unsqueeze(0).to(device)
+with torch.no_grad():
+    prefix = clip_model.encode_image(image).to(
+        device, dtype=torch.float32
+    )
+    prefix_embed = model.clip_project(prefix).reshape(1, prefix_length, -1)
+# generate the caption
+model.generate_beam(embed=prefix_embed)[0]
+# >> "He is riding a skateboard in a skate park, he wants to skate."
+```
+## BibTex and citation info
+```BibTeX
+```