Update README.md
Browse files
README.md
CHANGED
|
@@ -16,6 +16,7 @@ tags:
|
|
| 16 |
---
|
| 17 |
|
| 18 |
# VisualQuality-R1-7B
|
|
|
|
| 19 |
This is the latest version of VisualQuality-R1, trained on a diverse combination of synthetic and realistic datasets.<br>
|
| 20 |
Paper link: [arXiv](https://arxiv.org/abs/2505.14460)<br>
|
| 21 |
Code link: [github](https://github.com/TianheWu/VisualQuality-R1)
|
|
@@ -59,17 +60,16 @@ def score_image(image_path, model, processor):
|
|
| 59 |
PROMPT = (
|
| 60 |
"You are doing the image quality assessment task. Here is the question: "
|
| 61 |
"What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
|
| 62 |
-
"rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality.
|
| 63 |
-
"First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
|
| 64 |
)
|
| 65 |
-
|
| 66 |
QUESTION_TEMPLATE = "{Question} Please only output the final answer with only one score in <answer> </answer> tags."
|
| 67 |
message = [
|
| 68 |
{
|
| 69 |
"role": "user",
|
| 70 |
"content": [
|
| 71 |
{'type': 'image', 'image': image_path},
|
| 72 |
-
{"type": "text", "text": PROMPT}
|
| 73 |
],
|
| 74 |
}
|
| 75 |
]
|
|
@@ -273,8 +273,7 @@ def score_image(image_path, model, processor):
|
|
| 273 |
PROMPT = (
|
| 274 |
"You are doing the image quality assessment task. Here is the question: "
|
| 275 |
"What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
|
| 276 |
-
"rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality.
|
| 277 |
-
"First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
|
| 278 |
)
|
| 279 |
|
| 280 |
QUESTION_TEMPLATE = "{Question} First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
|
|
@@ -284,7 +283,7 @@ def score_image(image_path, model, processor):
|
|
| 284 |
"role": "user",
|
| 285 |
"content": [
|
| 286 |
{'type': 'image', 'image': image_path},
|
| 287 |
-
{"type": "text", "text": PROMPT}
|
| 288 |
],
|
| 289 |
}
|
| 290 |
]
|
|
@@ -600,11 +599,40 @@ print("Done!")
|
|
| 600 |
```
|
| 601 |
</details>
|
| 602 |
|
|
|
|
| 603 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 604 |
|
| 605 |
-
|
| 606 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 607 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 608 |
|
| 609 |
## BibTeX
|
| 610 |
```
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
# VisualQuality-R1-7B
|
| 19 |
+
Our Paper has been accept as **spotlight** in NeurIPS 2025!
|
| 20 |
This is the latest version of VisualQuality-R1, trained on a diverse combination of synthetic and realistic datasets.<br>
|
| 21 |
Paper link: [arXiv](https://arxiv.org/abs/2505.14460)<br>
|
| 22 |
Code link: [github](https://github.com/TianheWu/VisualQuality-R1)
|
|
|
|
| 60 |
PROMPT = (
|
| 61 |
"You are doing the image quality assessment task. Here is the question: "
|
| 62 |
"What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
|
| 63 |
+
"rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality."
|
|
|
|
| 64 |
)
|
| 65 |
+
|
| 66 |
QUESTION_TEMPLATE = "{Question} Please only output the final answer with only one score in <answer> </answer> tags."
|
| 67 |
message = [
|
| 68 |
{
|
| 69 |
"role": "user",
|
| 70 |
"content": [
|
| 71 |
{'type': 'image', 'image': image_path},
|
| 72 |
+
{"type": "text", "text": QUESTION_TEMPLATE.format(Question=PROMPT)}
|
| 73 |
],
|
| 74 |
}
|
| 75 |
]
|
|
|
|
| 273 |
PROMPT = (
|
| 274 |
"You are doing the image quality assessment task. Here is the question: "
|
| 275 |
"What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
|
| 276 |
+
"rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality."
|
|
|
|
| 277 |
)
|
| 278 |
|
| 279 |
QUESTION_TEMPLATE = "{Question} First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
|
|
|
|
| 283 |
"role": "user",
|
| 284 |
"content": [
|
| 285 |
{'type': 'image', 'image': image_path},
|
| 286 |
+
{"type": "text", "text": QUESTION_TEMPLATE.format(Question=PROMPT)}
|
| 287 |
],
|
| 288 |
}
|
| 289 |
]
|
|
|
|
| 599 |
```
|
| 600 |
</details>
|
| 601 |
|
| 602 |
+
## Training
|
| 603 |
|
| 604 |
+
### Preparation
|
| 605 |
+
1. To smoothly execute the training procedure, first download the IQA images and place them all in a **single folder**.
|
| 606 |
+
2. Given an original MOS file (e.g., KADID-10K_mos.txt), first execute `cd datasets`, then run `python make_data.py` (with moderate modifications) to generate a **JSON file** for model training.
|
| 607 |
+
3. Download the [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) into a folder.
|
| 608 |
|
| 609 |
+
### Training within a Single Node
|
| 610 |
+
Please modify three elements in `src/open-r1-multimodal/run_scripts/KADID-10K/one_node_run_kadid.sh`:
|
| 611 |
+
```
|
| 612 |
+
--model_name_or_path [Your Qwen2.5-VL-7B-Instruct path] \
|
| 613 |
+
--image_folders [Your dataset images path] \
|
| 614 |
+
--data_file_paths [Your JSON file path] \
|
| 615 |
+
```
|
| 616 |
+
Then, run:
|
| 617 |
+
```
|
| 618 |
+
bash src/open-r1-multimodal/run_scripts/KADID-10K/one_node_run_kadid.sh
|
| 619 |
+
```
|
| 620 |
|
| 621 |
+
### Training within Multiple Nodes
|
| 622 |
+
After making the necessary modifications, run the following command:
|
| 623 |
+
```
|
| 624 |
+
bash src/open-r1-multimodal/run_scripts/KADID-10K/multi_run_kadid.sh
|
| 625 |
+
```
|
| 626 |
+
|
| 627 |
+
|
| 628 |
+
## Acknowledgement
|
| 629 |
+
- [VLM-R1](https://github.com/om-ai-lab/VLM-R1): We start from codebase from the VLM-R1.
|
| 630 |
+
|
| 631 |
+
I would like to sincerely thank [Zhuoyan Luo](https://scholar.google.com/citations?user=mKQhEsIAAAAJ&hl=en&oi=ao) for the generous support of my project and for the invaluable guidance in the field of AR generation.
|
| 632 |
+
|
| 633 |
+
|
| 634 |
+
## 📧 Contact
|
| 635 |
+
If you have any question, please email `[email protected]` or `[email protected]`.
|
| 636 |
|
| 637 |
## BibTeX
|
| 638 |
```
|