Molkaatb commited on
Commit
dacc61c
·
verified ·
1 Parent(s): aa2a397

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -3
README.md CHANGED
@@ -1,6 +1,63 @@
1
  ---
2
- license: apache-2.0
3
- metrics:
4
- - bleu
 
 
 
 
 
 
5
  pipeline_tag: image-to-text
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - vision
6
+ - text-generation
7
+ - medical
8
+ - chest-xray
9
+ - healthcare
10
+ - multimodal
11
  pipeline_tag: image-to-text
12
  ---
13
+
14
+ # 🩺 ChestX – Chest X-ray Report Generation (ViT-GPT2)
15
+
16
+ This model generates **medical diagnostic reports from chest X-ray images**.
17
+ It was developed for the **TWESD Healthcare AI Competition 2024** as part of my final-year engineering project.
18
+
19
+ The architecture combines a **Vision Transformer (ViT)** for image encoding with **GPT-2** as the language decoder, forming an **encoder–decoder multimodal model**.
20
+
21
+ ---
22
+
23
+ ## 📌 Model Description
24
+ - **Architecture:** VisionEncoderDecoderModel (ViT + GPT-2)
25
+ - **Input:** Chest X-ray image
26
+ - **Output:** Text report describing findings
27
+ - **Framework:** PyTorch + Hugging Face Transformers
28
+
29
+ ---
30
+
31
+ ## 💡 Intended Uses & Limitations
32
+ ✅ Intended for:
33
+ - Research in **medical AI & multimodal learning**
34
+ - Exploring **vision-to-text generation**
35
+ - Educational and prototyping purposes
36
+
37
+ ⚠️ Limitations:
38
+ - Not intended for **real clinical diagnosis**
39
+ - Trained on a limited dataset (IU Chest X-ray), may not generalize to all populations
40
+
41
+ ---
42
+
43
+ ## 🛠️ How to Use
44
+
45
+ ```python
46
+ from transformers import VisionEncoderDecoderModel, AutoTokenizer, AutoFeatureExtractor
47
+ from PIL import Image
48
+ import torch
49
+
50
+ # Load model and tokenizer
51
+ model = VisionEncoderDecoderModel.from_pretrained("Molkaatb/ChestX").to("cuda")
52
+ tokenizer = AutoTokenizer.from_pretrained("gpt2")
53
+ tokenizer.pad_token = tokenizer.eos_token
54
+ feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
55
+
56
+ # Example image
57
+ image = Image.open("example_xray.png").convert("RGB")
58
+ inputs = feature_extractor(images=image, return_tensors="pt").pixel_values.to("cuda")
59
+
60
+ # Generate report
61
+ outputs = model.generate(inputs, max_length=512, num_beams=4)
62
+ report = tokenizer.decode(outputs[0], skip_special_tokens=True)
63
+ print(report)