ShuaiYang03 commited on
Commit
83d2feb
·
verified ·
1 Parent(s): 6eaeabc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -1
README.md CHANGED
@@ -17,4 +17,46 @@ tags:
17
  **Evaluation Results**
18
 
19
  * `results_step-006000-epoch-01-loss=0.1724_instruct_cot_2-3`: Results on **SimplerEnv-Instruct** with multimodal reasoning.
20
- * `results_step-006000-epoch-01-loss=0.1724_simpler_1-3`: Results on **SimplerEnv**.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  **Evaluation Results**
18
 
19
  * `results_step-006000-epoch-01-loss=0.1724_instruct_cot_2-3`: Results on **SimplerEnv-Instruct** with multimodal reasoning.
20
+ * `results_step-006000-epoch-01-loss=0.1724_simpler_1-3`: Results on **SimplerEnv**.
21
+ * `vlmeval`: Multimodal performance
22
+
23
+ This checkpoint supports dialogue, please check our [Code](https://github.com/InternRobotics/InstructVLA#evaluation)
24
+
25
+ ```python
26
+
27
+ import torch
28
+ from vla.instructvla_eagle_dual_sys_v2_meta_query_v2 import load, load_vla
29
+ from PIL import Image
30
+ import numpy as np
31
+
32
+ model_path = 'outputs/release_ckpts/instructvla_finetune_v2_xlora_freeze_head_instruction--image_aug/checkpoints/step-013500-epoch-01-loss=0.1093.pt'
33
+
34
+ # Load Stage-2 (Generalist) model
35
+ model = load_vla(model_path, stage="stage2").eval().to(torch.bfloat16).cuda()
36
+
37
+ messages = [
38
+ {"content": "You are a helpful assistant."}, # system
39
+ {
40
+ "role": "user",
41
+ "content": "Can you describe the main idea of this image?",
42
+ "image": [{'np_array': np.asarray(Image.open("./asset/teaser.png"))}]
43
+ }
44
+ ]
45
+
46
+ # Preprocess input
47
+ inputs = model.processor.prepare_input(dict(prompt=messages))
48
+ autocast_dtype = torch.bfloat16
49
+
50
+ with torch.autocast("cuda", dtype=autocast_dtype, enabled=True):
51
+ output = model.vlm.generate(
52
+ input_ids=inputs['input_ids'].cuda(),
53
+ attention_mask=inputs['attention_mask'].cuda(),
54
+ pixel_values=inputs['pixel_values'].cuda(),
55
+ max_new_tokens=200,
56
+ output_hidden_states=False,
57
+ )
58
+
59
+ response = model.processor.tokenizer.decode(output[0])
60
+ print(response)
61
+
62
+ ```