TianheWu commited on
Commit
12852ce
·
verified ·
1 Parent(s): 6e5e7d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -9
README.md CHANGED
@@ -16,6 +16,7 @@ tags:
16
  ---
17
 
18
  # VisualQuality-R1-7B
 
19
  This is the latest version of VisualQuality-R1, trained on a diverse combination of synthetic and realistic datasets.<br>
20
  Paper link: [arXiv](https://arxiv.org/abs/2505.14460)<br>
21
  Code link: [github](https://github.com/TianheWu/VisualQuality-R1)
@@ -59,17 +60,16 @@ def score_image(image_path, model, processor):
59
  PROMPT = (
60
  "You are doing the image quality assessment task. Here is the question: "
61
  "What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
62
- "rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality. "
63
- "First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
64
  )
65
-
66
  QUESTION_TEMPLATE = "{Question} Please only output the final answer with only one score in <answer> </answer> tags."
67
  message = [
68
  {
69
  "role": "user",
70
  "content": [
71
  {'type': 'image', 'image': image_path},
72
- {"type": "text", "text": PROMPT}
73
  ],
74
  }
75
  ]
@@ -273,8 +273,7 @@ def score_image(image_path, model, processor):
273
  PROMPT = (
274
  "You are doing the image quality assessment task. Here is the question: "
275
  "What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
276
- "rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality. "
277
- "First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
278
  )
279
 
280
  QUESTION_TEMPLATE = "{Question} First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
@@ -284,7 +283,7 @@ def score_image(image_path, model, processor):
284
  "role": "user",
285
  "content": [
286
  {'type': 'image', 'image': image_path},
287
- {"type": "text", "text": PROMPT}
288
  ],
289
  }
290
  ]
@@ -600,11 +599,40 @@ print("Done!")
600
  ```
601
  </details>
602
 
 
603
 
 
 
 
 
604
 
605
- ## 📧 Contact
606
- If you have any question, please email `[email protected]` or `tianhewu-c@my.cityu.edu.hk`.
 
 
 
 
 
 
 
 
 
607
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
608
 
609
  ## BibTeX
610
  ```
 
16
  ---
17
 
18
  # VisualQuality-R1-7B
19
+ Our Paper has been accept as **spotlight** in NeurIPS 2025!
20
  This is the latest version of VisualQuality-R1, trained on a diverse combination of synthetic and realistic datasets.<br>
21
  Paper link: [arXiv](https://arxiv.org/abs/2505.14460)<br>
22
  Code link: [github](https://github.com/TianheWu/VisualQuality-R1)
 
60
  PROMPT = (
61
  "You are doing the image quality assessment task. Here is the question: "
62
  "What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
63
+ "rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality."
 
64
  )
65
+
66
  QUESTION_TEMPLATE = "{Question} Please only output the final answer with only one score in <answer> </answer> tags."
67
  message = [
68
  {
69
  "role": "user",
70
  "content": [
71
  {'type': 'image', 'image': image_path},
72
+ {"type": "text", "text": QUESTION_TEMPLATE.format(Question=PROMPT)}
73
  ],
74
  }
75
  ]
 
273
  PROMPT = (
274
  "You are doing the image quality assessment task. Here is the question: "
275
  "What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
276
+ "rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality."
 
277
  )
278
 
279
  QUESTION_TEMPLATE = "{Question} First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
 
283
  "role": "user",
284
  "content": [
285
  {'type': 'image', 'image': image_path},
286
+ {"type": "text", "text": QUESTION_TEMPLATE.format(Question=PROMPT)}
287
  ],
288
  }
289
  ]
 
599
  ```
600
  </details>
601
 
602
+ ## Training
603
 
604
+ ### Preparation
605
+ 1. To smoothly execute the training procedure, first download the IQA images and place them all in a **single folder**.
606
+ 2. Given an original MOS file (e.g., KADID-10K_mos.txt), first execute `cd datasets`, then run `python make_data.py` (with moderate modifications) to generate a **JSON file** for model training.
607
+ 3. Download the [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) into a folder.
608
 
609
+ ### Training within a Single Node
610
+ Please modify three elements in `src/open-r1-multimodal/run_scripts/KADID-10K/one_node_run_kadid.sh`:
611
+ ```
612
+ --model_name_or_path [Your Qwen2.5-VL-7B-Instruct path] \
613
+ --image_folders [Your dataset images path] \
614
+ --data_file_paths [Your JSON file path] \
615
+ ```
616
+ Then, run:
617
+ ```
618
+ bash src/open-r1-multimodal/run_scripts/KADID-10K/one_node_run_kadid.sh
619
+ ```
620
 
621
+ ### Training within Multiple Nodes
622
+ After making the necessary modifications, run the following command:
623
+ ```
624
+ bash src/open-r1-multimodal/run_scripts/KADID-10K/multi_run_kadid.sh
625
+ ```
626
+
627
+
628
+ ## Acknowledgement
629
+ - [VLM-R1](https://github.com/om-ai-lab/VLM-R1): We start from codebase from the VLM-R1.
630
+
631
+ I would like to sincerely thank [Zhuoyan Luo](https://scholar.google.com/citations?user=mKQhEsIAAAAJ&hl=en&oi=ao) for the generous support of my project and for the invaluable guidance in the field of AR generation.
632
+
633
+
634
+ ## 📧 Contact
635
+ If you have any question, please email `[email protected]` or `[email protected]`.
636
 
637
  ## BibTeX
638
  ```