Text-to-Image
Merlin
English
File size: 2,067 Bytes
59e7a07
 
 
 
149c2de
59e7a07
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b3fcfa
59e7a07
 
 
 
 
 
 
 
 
 
 
 
8d84727
6b3fcfa
5366b2b
59e7a07
 
 
 
 
 
 
 
 
 
 
 
 
f1ecfbe
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: mit
language:
- en
library_name: merlin
pipeline_tag: text-to-image
---

# Merlin: Vision Language Foundation Model for 3D Computed Tomography

[![pypi](https://img.shields.io/pypi/v/merlin-vlm?style=for-the-badge)](https://pypi.org/project/merlin-vlm/)

Merlin is a 3D VLM for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining. The huggingface repository here provides the model weights and an example image file.

[[πŸ’» Github](https://github.com/StanfordMIMI/Merlin)] [[πŸ“„ Paper](https://arxiv.org/abs/2406.06512)]

## ⚑️ Installation

To install Merlin, you can simply run:

```python
pip install merlin-vlm
```

For an editable installation, use the following commands to clone and install this repository.
```python
git clone https://github.com/StanfordMIMI/Merlin.git
cd Merlin
pip install -e .
```

For usage instructions, please visit the github [repository](https://github.com/StanfordMIMI/Merlin).

### πŸ“ Project Structure:

```
.
β”œβ”€β”€ README.md
β”œβ”€β”€ i3_resnet_clinical_longformer_best_clip_04-02-2024_23-21-36_epoch_99.pt <Merlin weights>
β”œβ”€β”€ image1.nii.gz <Sample Image>
β”œβ”€β”€ resnet_gpt2_best_stanford_report_generation_average.pt <Merlin Radiology Report Generation Weights>
β”œβ”€β”€ resnet_clinical_longformer_five_year_disease_prediction <Five Year Disease Prediction Weights>
β”œβ”€β”€ nnUNetTrainerMerlin__nnUNetPlans__3d_fullres <nnUNet File>
```

## πŸ“Ž Citation
If you find this repository useful for your work, please cite the cite the [original paper](https://arxiv.org/abs/2406.06512):

```bibtex
@article{blankemeier2024merlin,
  title={Merlin: A vision language foundation model for 3d computed tomography},
  author={Blankemeier, Louis and Cohen, Joseph Paul and Kumar, Ashwin and Van Veen, Dave and Gardezi, Syed Jamal Safdar and Paschali, Magdalini and Chen, Zhihong and Delbrouck, Jean-Benoit and Reis, Eduardo and Truyts, Cesar and others},
  journal={Research Square},
  pages={rs--3},
  year={2024}
}
```