hoangus0303
/

paraformer-large-clone-from-funasr

@@ -1,77 +1,56 @@
----
-license: apache-2.0
-language:
-- zh
-metrics:
-- accuracy
-- cer
-pipeline_tag: automatic-speech-recognition
-tags:
-- Paraformer
-- FunASR
-- ASR
----
-## Introduce
-[Paraformer](https://arxiv.org/abs/2206.08317) is an innovative non-autoregressive end-to-end speech recognition model that offers significant advantages over traditional autoregressive models. Unlike its counterparts, Paraformer can generate the target text for an entire sentence in parallel, making it ideal for parallel inference using GPUs. This capability leads to significant improvements in inference efficiency, which can reduce machine costs for speech recognition cloud services by almost 10 times. Furthermore, Paraformer can achieve the same performance as autoregressive models on industrial-scale data.
-This repository demonstrates how to leverage Paraformer in conjunction with the funasr_onnx runtime. The underlying model is derived from [FunASR](https://github.com/alibaba-damo-academy/FunASR), which was trained on a massive 60,000-hour Mandarin dataset. Notably, Paraformer's performance secured the top spot on the [SpeechIO leaderboard](https://github.com/SpeechColab/Leaderboard), highlighting its exceptional capabilities in speech recognition.
-We have relesed numerous industrial-grade models, including speech recognition, voice activity detection, punctuation restoration, speaker verification, speaker diarization, and timestamp prediction (force alignment). To learn more about these models, kindly refer to the [documentation](https://alibaba-damo-academy.github.io/FunASR/en/index.html) available on FunASR. If you are interested in leveraging advanced AI technology for your speech-related projects, we invite you to explore the possibilities offered by [FunASR](https://github.com/alibaba-damo-academy/FunASR).
-## Install funasr_onnx
-```shell
-pip install -U funasr_onnx
-# For the users in China, you could install with the command:
-# pip install -U funasr_onnx -i https://mirror.sjtu.edu.cn/pypi/web/simple
-```
-## Download the model
-```shell
-git clone https://huggingface.co/funasr/paraformer-large
-```
-## Inference with runtime
-### Speech Recognition
-#### Paraformer
- ```python
- from funasr_onnx import Paraformer
- model_dir = "./paraformer-large"
- model = Paraformer(model_dir, batch_size=1, quantize=True)
- wav_path = ['./funasr/paraformer-large/asr_example.wav']
- result = model(wav_path)
- print(result)
- ```
-- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
-- `batch_size`: `1` (Default), the batch size duration inference
-- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
-- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
-- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
-Input: wav formt file, support formats: `str, np.ndarray, List[str]`
-Output: `List[str]`: recognition result
-## Performance benchmark
-Please ref to [benchmark](https://alibaba-damo-academy.github.io/FunASR/en/benchmark/benchmark_onnx_cpp.html)
-## Citations
-``` bibtex
-@inproceedings{gao2022paraformer,
-  title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
-  author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie},
-  booktitle={INTERSPEECH},
-  year={2022}
-}
-```

+---
+license: apache-2.0
+language:
+- zh
+metrics:
+- accuracy
+- cer
+pipeline_tag: automatic-speech-recognition
+tags:
+- Paraformer
+- FunASR
+- ASR
+---
+## Introduce
+This repo cloned from https://huggingface.co/funasr/Paraformer-large
+## Install funasr_onnx
+```shell
+pip install -U funasr_onnx
+# For the users in China, you could install with the command:
+# pip install -U funasr_onnx -i https://mirror.sjtu.edu.cn/pypi/web/simple
+```
+## Download the model
+```shell
+git clone https://huggingface.co/hoangus0303/paraformer-large-clone-from-funasr
+```
+## Inference with runtime
+### Speech Recognition
+#### Paraformer
+ ```python
+ from funasr_onnx import Paraformer
+ model_dir = "./paraformer-large"
+ model = Paraformer(model_dir, batch_size=1, quantize=True)
+ wav_path = ['./funasr/paraformer-large/asr_example.wav']
+ result = model(wav_path)
+ print(result)
+ ```
+- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
+- `batch_size`: `1` (Default), the batch size duration inference
+- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
+- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
+- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
+Input: wav formt file, support formats: `str, np.ndarray, List[str]`
+Output: `List[str]`: recognition result