FireRedASR-AED

小红书ASR AED-L版本在AX650N上的部署,原项目地址为:https://github.com/FireRedTeam/FireRedASR

转换后的模型放置在axmodel目录,目前支持中文、英文,最长输入10秒的音频。

模型转换

参考Github

支持平台

  • AX650N

安装依赖

Python

测试环境为Python 3.12,建议使用Miniconda,安装方法参考

conda create -n fireredasr python=3.12
conda activate fireredasr
pip install -r requirements.txt

安装pyaxengine

wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc1/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl

使用

conda activate fireredasr
python test_ax_model.py

输出结果如下:

[INFO] Available providers:  ['AxEngineExecutionProvider']
Namespace(encoder='axmodel/encoder.axmodel', decoder='axmodel/decoder_main.axmodel', cmvn='axmodel/cmvn.ark', dict='axmodel/dict.txt', spm_model='axmodel/train_bpe1000.model', wavlist='wavlist.txt', hypo='hypo_axmodel.txt', beam_size=3, nbest=1, max_len=128)
[WARNING] Selected provider(s): ['AXCLRTExecutionProvider'] is(are) not available.
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 4.2 9555977e
load encoder cost 2.764460325241089 seconds
[WARNING] Selected provider(s): ['AXCLRTExecutionProvider'] is(are) not available.
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 4.2 9555977e
load decoder_main cost 16.36833119392395 seconds
[WARNING] Selected provider(s): ['AXCLRTExecutionProvider'] is(are) not available.
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 4.2 9555977e
load decoder_loop cost 16.194183826446533 seconds
run encoder take 196.9749927520752ms
run decoder_main take 130.2931308746338ms
run decoder_loop take 165.5733585357666ms
run decoder_loop take 109.67779159545898ms
run decoder_loop take 101.15742683410645ms
run decoder_loop take 110.09836196899414ms
run decoder_loop take 100.29029846191406ms
run decoder_loop take 109.33351516723633ms
run decoder_loop take 100.37779808044434ms
run decoder_loop take 109.72428321838379ms
run decoder_loop take 100.42023658752441ms
run decoder_loop take 101.71890258789062ms
run decoder_loop take 100.09407997131348ms
run decoder_loop take 110.25619506835938ms
run decoder_loop take 100.54206848144531ms
run decoder_loop take 101.93896293640137ms
['wav/TEST_NET_Y0000000000_-KTKHdZ2fb8_S00000.wav']
Durations: 1.8
Transcribe Durations: 2.5527637004852295
(Real time factor) RTF: 1.4182020558251274
wav: wav/TEST_NET_Y0000000000_-KTKHdZ2fb8_S00000.wav
text: 我有的时候说不清楚你们知道吗
score: -0.9156361222267151

run encoder take 180.2656650543213ms
run decoder_main take 91.42565727233887ms
run decoder_loop take 105.18240928649902ms
run decoder_loop take 100.56614875793457ms
run decoder_loop take 100.9066104888916ms
run decoder_loop take 100.9068489074707ms
run decoder_loop take 102.90265083312988ms
run decoder_loop take 100.50129890441895ms
run decoder_loop take 110.12482643127441ms
run decoder_loop take 100.65031051635742ms
run decoder_loop take 110.09883880615234ms
run decoder_loop take 105.48877716064453ms
run decoder_loop take 100.32439231872559ms
run decoder_loop take 106.08601570129395ms
run decoder_loop take 100.79813003540039ms
run decoder_loop take 100.4643440246582ms
run decoder_loop take 100.30460357666016ms
['wav/TEST_MEETING_T0000000001_S00000.wav']
Durations: 12.369
Transcribe Durations: 2.464834690093994
(Real time factor) RTF: 0.19927517908432324
wav: wav/TEST_MEETING_T0000000001_S00000.wav
text: 好首先说一下刚才这个
score: -0.5064160823822021

run encoder take 172.59907722473145ms
run decoder_main take 91.79949760437012ms
run decoder_loop take 105.04364967346191ms
run decoder_loop take 100.62885284423828ms
run decoder_loop take 101.89318656921387ms
run decoder_loop take 100.42643547058105ms
run decoder_loop take 109.7562313079834ms
['wav/IT0011W0001.wav']
Durations: 1.992
Transcribe Durations: 1.0302071571350098
(Real time factor) RTF: 0.5171722676380571
wav: wav/IT0011W0001.wav
text: 换一首歌
score: -0.016501454636454582

run encoder take 173.07257652282715ms
run decoder_main take 91.48693084716797ms
run decoder_loop take 105.42607307434082ms
run decoder_loop take 100.10981559753418ms
run decoder_loop take 100.4478931427002ms
run decoder_loop take 100.23713111877441ms
run decoder_loop take 100.10337829589844ms
run decoder_loop take 100.29196739196777ms
run decoder_loop take 101.7463207244873ms
run decoder_loop take 100.8148193359375ms
run decoder_loop take 109.99274253845215ms
run decoder_loop take 105.45015335083008ms
run decoder_loop take 100.59380531311035ms
run decoder_loop take 100.73733329772949ms
run decoder_loop take 100.4335880279541ms
run decoder_loop take 109.68661308288574ms
['wav/BAC009S0764W0121.wav']
Durations: 4.2039375
Transcribe Durations: 2.3024709224700928
(Real time factor) RTF: 0.5476938994621334
wav: wav/BAC009S0764W0121.wav
text: 甚至出现交易几乎停滞的情况
score: -0.11461181938648224

total wav durations: 20.364937500000003
total transcribe durations: 8.350276470184326
AVG RTF: 0.4100320204854213

hypo_axmodel.txt包含识别结果

Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including AXERA-TECH/FireRedASR-AED