Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ video and directly generate coherent text and speech streams, achieving truly in
|
|
| 19 |
|
| 20 |
This is the schematic diagram for multi-turn audio-visual interaction.
|
| 21 |
<p align="center">
|
| 22 |
-
<img src="https://github.com/SenseTime-FVG/InteractiveOmni/
|
| 23 |
<p>
|
| 24 |
|
| 25 |
### Key Features
|
|
@@ -30,7 +30,7 @@ This is the schematic diagram for multi-turn audio-visual interaction.
|
|
| 30 |
* **On-device Model:** the 4B model achieves 97% of the performance with just 50% of the model size compared with 8B model.
|
| 31 |
### Model Architecture
|
| 32 |
<p align="center">
|
| 33 |
-
<img src="https://github.com/SenseTime-FVG/InteractiveOmni/
|
| 34 |
<p>
|
| 35 |
|
| 36 |
|
|
@@ -256,7 +256,7 @@ torchaudio.save("result_custom_speaker.wav", wav_response.cpu(), 24000, format="
|
|
| 256 |
## Evaluation
|
| 257 |
InteractiveOmni achieves state-of-the-art performance across a wide range of multi-modal understanding and speech generation benchmarks.
|
| 258 |
<p align="center">
|
| 259 |
-
<img src="https://github.com/SenseTime-FVG/InteractiveOmni/
|
| 260 |
<p>
|
| 261 |
|
| 262 |
<details>
|
|
|
|
| 19 |
|
| 20 |
This is the schematic diagram for multi-turn audio-visual interaction.
|
| 21 |
<p align="center">
|
| 22 |
+
<img src="https://raw.github.com/SenseTime-FVG/InteractiveOmni/main/assets/demo_interaction.png" width="99%"/>
|
| 23 |
<p>
|
| 24 |
|
| 25 |
### Key Features
|
|
|
|
| 30 |
* **On-device Model:** the 4B model achieves 97% of the performance with just 50% of the model size compared with 8B model.
|
| 31 |
### Model Architecture
|
| 32 |
<p align="center">
|
| 33 |
+
<img src="https://raw.github.com/SenseTime-FVG/InteractiveOmni/main/assets/model_architecture.png" width="80%"/>
|
| 34 |
<p>
|
| 35 |
|
| 36 |
|
|
|
|
| 256 |
## Evaluation
|
| 257 |
InteractiveOmni achieves state-of-the-art performance across a wide range of multi-modal understanding and speech generation benchmarks.
|
| 258 |
<p align="center">
|
| 259 |
+
<img src="https://raw.github.com/SenseTime-FVG/InteractiveOmni/main/assets/radar_chart.png" width="70%"/>
|
| 260 |
<p>
|
| 261 |
|
| 262 |
<details>
|