sensefvg commited on
Commit
cbd2695
·
verified ·
1 Parent(s): 6b3bff7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -19,7 +19,7 @@ video and directly generate coherent text and speech streams, achieving truly in
19
 
20
  This is the schematic diagram for multi-turn audio-visual interaction.
21
  <p align="center">
22
- <img src="https://github.com/SenseTime-FVG/InteractiveOmni/master/assets/demo_interaction.png" width="99%"/>
23
  <p>
24
 
25
  ### Key Features
@@ -30,7 +30,7 @@ This is the schematic diagram for multi-turn audio-visual interaction.
30
  * **On-device Model:** the 4B model achieves 97% of the performance with just 50% of the model size compared with 8B model.
31
  ### Model Architecture
32
  <p align="center">
33
- <img src="https://github.com/SenseTime-FVG/InteractiveOmni/master/assets/model_architecture.png" width="80%"/>
34
  <p>
35
 
36
 
@@ -256,7 +256,7 @@ torchaudio.save("result_custom_speaker.wav", wav_response.cpu(), 24000, format="
256
  ## Evaluation
257
  InteractiveOmni achieves state-of-the-art performance across a wide range of multi-modal understanding and speech generation benchmarks.
258
  <p align="center">
259
- <img src="https://github.com/SenseTime-FVG/InteractiveOmni/master/assets/radar_chart.png" width="70%"/>
260
  <p>
261
 
262
  <details>
 
19
 
20
  This is the schematic diagram for multi-turn audio-visual interaction.
21
  <p align="center">
22
+ <img src="https://raw.github.com/SenseTime-FVG/InteractiveOmni/main/assets/demo_interaction.png" width="99%"/>
23
  <p>
24
 
25
  ### Key Features
 
30
  * **On-device Model:** the 4B model achieves 97% of the performance with just 50% of the model size compared with 8B model.
31
  ### Model Architecture
32
  <p align="center">
33
+ <img src="https://raw.github.com/SenseTime-FVG/InteractiveOmni/main/assets/model_architecture.png" width="80%"/>
34
  <p>
35
 
36
 
 
256
  ## Evaluation
257
  InteractiveOmni achieves state-of-the-art performance across a wide range of multi-modal understanding and speech generation benchmarks.
258
  <p align="center">
259
+ <img src="https://raw.github.com/SenseTime-FVG/InteractiveOmni/main/assets/radar_chart.png" width="70%"/>
260
  <p>
261
 
262
  <details>