TianheWu
/

VisualQuality-R1-7B

@@ -16,6 +16,7 @@ tags:
 ---
 # VisualQuality-R1-7B
 This is the latest version of VisualQuality-R1, trained on a diverse combination of synthetic and realistic datasets.<br>
 Paper link: [arXiv](https://arxiv.org/abs/2505.14460)<br>
 Code link: [github](https://github.com/TianheWu/VisualQuality-R1)
@@ -59,17 +60,16 @@ def score_image(image_path, model, processor):
     PROMPT = (
         "You are doing the image quality assessment task. Here is the question: "
         "What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
-        "rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality. "
-        "First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
     )
     QUESTION_TEMPLATE = "{Question} Please only output the final answer with only one score in <answer> </answer> tags."
     message = [
         {
             "role": "user",
             "content": [
                 {'type': 'image', 'image': image_path},
-                {"type": "text", "text": PROMPT}
             ],
         }
     ]
@@ -273,8 +273,7 @@ def score_image(image_path, model, processor):
     PROMPT = (
         "You are doing the image quality assessment task. Here is the question: "
         "What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
-        "rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality. "
-        "First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
     )
     QUESTION_TEMPLATE = "{Question} First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
@@ -284,7 +283,7 @@ def score_image(image_path, model, processor):
             "role": "user",
             "content": [
                 {'type': 'image', 'image': image_path},
-                {"type": "text", "text": PROMPT}
             ],
         }
     ]
@@ -600,11 +599,40 @@ print("Done!")
 ```
 </details>
-## 📧 Contact
-If you have any question, please email `[email protected]` or `tianhewu-c@my.cityu.edu.hk`.
 ## BibTeX
 ```

 ---
 # VisualQuality-R1-7B
+Our Paper has been accept as **spotlight** in NeurIPS 2025!
 This is the latest version of VisualQuality-R1, trained on a diverse combination of synthetic and realistic datasets.<br>
 Paper link: [arXiv](https://arxiv.org/abs/2505.14460)<br>
 Code link: [github](https://github.com/TianheWu/VisualQuality-R1)
     PROMPT = (
         "You are doing the image quality assessment task. Here is the question: "
         "What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
+        "rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality."
     )
     QUESTION_TEMPLATE = "{Question} Please only output the final answer with only one score in <answer> </answer> tags."
     message = [
         {
             "role": "user",
             "content": [
                 {'type': 'image', 'image': image_path},
+                {"type": "text", "text": QUESTION_TEMPLATE.format(Question=PROMPT)}
             ],
         }
     ]
     PROMPT = (
         "You are doing the image quality assessment task. Here is the question: "
         "What is your overall rating on the quality of this picture? The rating should be a float between 1 and 5, "
+        "rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality."
     )
     QUESTION_TEMPLATE = "{Question} First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags."
             "role": "user",
             "content": [
                 {'type': 'image', 'image': image_path},
+                {"type": "text", "text": QUESTION_TEMPLATE.format(Question=PROMPT)}
             ],
         }
     ]
 ```
 </details>
+## Training
+### Preparation
+1. To smoothly execute the training procedure, first download the IQA images and place them all in a **single folder**.
+2. Given an original MOS file (e.g., KADID-10K_mos.txt), first execute `cd datasets`, then run `python make_data.py` (with moderate modifications) to generate a **JSON file** for model training.
+3. Download the [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) into a folder.
+### Training within a Single Node
+Please modify three elements in `src/open-r1-multimodal/run_scripts/KADID-10K/one_node_run_kadid.sh`:
+```
+--model_name_or_path [Your Qwen2.5-VL-7B-Instruct path] \
+--image_folders [Your dataset images path] \
+--data_file_paths [Your JSON file path] \
+```
+Then, run:
+```
+bash src/open-r1-multimodal/run_scripts/KADID-10K/one_node_run_kadid.sh
+```
+### Training within Multiple Nodes
+After making the necessary modifications, run the following command:
+```
+bash src/open-r1-multimodal/run_scripts/KADID-10K/multi_run_kadid.sh
+```
+## Acknowledgement
+- [VLM-R1](https://github.com/om-ai-lab/VLM-R1): We start from codebase from the VLM-R1.
+I would like to sincerely thank [Zhuoyan Luo](https://scholar.google.com/citations?user=mKQhEsIAAAAJ&hl=en&oi=ao) for the generous support of my project and for the invaluable guidance in the field of AR generation.
+## 📧 Contact
+If you have any question, please email `[email protected]` or `[email protected]`.
 ## BibTeX
 ```