sensefvg
/

InteractiveOmni-4B

interactiveomni

Model card Files Files and versions

sensefvg commited on 23 days ago

Commit

cbd2695

·

verified ·

1 Parent(s): 6b3bff7

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ video and directly generate coherent text and speech streams, achieving truly in
 This is the schematic diagram for multi-turn audio-visual interaction.
 <p align="center">
-    <img src="https://github.com/SenseTime-FVG/InteractiveOmni/master/assets/demo_interaction.png" width="99%"/>
 <p>
 ### Key Features
@@ -30,7 +30,7 @@ This is the schematic diagram for multi-turn audio-visual interaction.
 * **On-device Model:**  the 4B model achieves 97% of the performance with just 50% of the model size compared with 8B model.
 ### Model Architecture
 <p align="center">
-    <img src="https://github.com/SenseTime-FVG/InteractiveOmni/master/assets/model_architecture.png" width="80%"/>
 <p>
@@ -256,7 +256,7 @@ torchaudio.save("result_custom_speaker.wav", wav_response.cpu(), 24000, format="
 ## Evaluation
 InteractiveOmni achieves state-of-the-art performance across a wide range of multi-modal understanding and speech generation benchmarks.
 <p align="center">
-    <img src="https://github.com/SenseTime-FVG/InteractiveOmni/master/assets/radar_chart.png" width="70%"/>
 <p>
 <details>

 This is the schematic diagram for multi-turn audio-visual interaction.
 <p align="center">
+    <img src="https://raw.github.com/SenseTime-FVG/InteractiveOmni/main/assets/demo_interaction.png" width="99%"/>
 <p>
 ### Key Features
 * **On-device Model:**  the 4B model achieves 97% of the performance with just 50% of the model size compared with 8B model.
 ### Model Architecture
 <p align="center">
+    <img src="https://raw.github.com/SenseTime-FVG/InteractiveOmni/main/assets/model_architecture.png" width="80%"/>
 <p>
 ## Evaluation
 InteractiveOmni achieves state-of-the-art performance across a wide range of multi-modal understanding and speech generation benchmarks.
 <p align="center">
+    <img src="https://raw.github.com/SenseTime-FVG/InteractiveOmni/main/assets/radar_chart.png" width="70%"/>
 <p>
 <details>