AXERA-TECH
/

SenseVoice

Automatic Speech Recognition

Model card Files Files and versions

SenseVoice / README.md

inoryQwQ's picture

fix readme

cfec603 9 days ago

|

history blame contribute delete

2.26 kB

	---
	license: mit
	language:
	- en
	pipeline_tag: automatic-speech-recognition
	---
	# sensevoice.axera
	FunASR SenseVoice on Axera, official repo: https://github.com/FunAudioLLM/SenseVoice

	## TODO

	- [x] 支持AX630C
	- [ ] 支持C++
	- [x] 支持FastAPI

	## 功能
	- 语音识别
	- 自动识别语言(支持中文、英文、粤语、日语、韩语)
	- 情感识别
	- 自动标点
	- 支持流式识别

	## 支持平台

	- [x] AX650N
	- [x] AX630C

	## 环境安装
	```
	pip3 install -r requirements.txt
	```
	如果空间不足可以使用 --prefix 指定别的安装路径


	## 使用
	```
	# 首次运行会自动从huggingface上下载模型, 保存到models中
	python3 main.py -i 输入音频文件
	```
	运行参数说明:
	\| 参数名称 \| 说明 \| 默认值 \|
	\| --- \| --- \| --- \|
	\| --input/-i \| 输入音频文件 \| \|
	\| --language/-l \| 识别语言，支持auto, zh, en, yue, ja, ko \| auto \|
	\| --streaming \| 流式识别 \| \|


	### 示例:
	example下有测试音频

	如粤语测试
	```
	python3 main.py -i example/yue.mp3
	```
	输出
	```
	RTF: 0.03026517820946964 Latency: 0.15689468383789062s Total length: 5.184s
	['呢几个字。', '都表达唔到，我想讲嘅意。', '思。']
	```

	流式识别

	```
	python3 main.py -i example/zh.mp3 --streaming
	```
	输出
	```
	{'timestamps': [540], 'text': '开'}
	{'timestamps': [540, 780, 1080], 'text': '开放时'}
	{'timestamps': [540, 780, 1080, 1260, 1740], 'text': '开放时间早'}
	{'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340], 'text': '开放时间早上9'}
	{'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640], 'text': '开放时间早上9点'}
	{'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640, 3060], 'text': '开放时间早上9点至'}
	{'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640, 3060, 3780, 4020], 'text': '开放时间早上9点至下午'}
	{'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640, 3060, 3780, 4020, 4440, 4620], 'text': '开放时间早上9点至下午五点'}
	RTF: 0.03678379235444246

	```

	## 准确率

	使用WER(Word-Error-Rate)作为评价标准

	WER = 0.0389

	### 复现测试结果

	```
	./download_datasets.sh
	python test_wer.py -d datasets -l zh
	```

	## 技术讨论

	- Github issues
	- QQ 群: 139953715