Update README.md
Browse files
README.md
CHANGED
|
@@ -16,25 +16,25 @@ Deployment Geography: Global <br>
|
|
| 16 |
Use Case: Developers, speech processing engineers, and AI researchers will use it as the first step for other speech processing models. <br>
|
| 17 |
|
| 18 |
|
| 19 |
-
##
|
| 20 |
[1] Jia, Fei, Somshubra Majumdar, and Boris Ginsburg. "MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. <br>
|
| 21 |
[2] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
| 22 |
<br>
|
| 23 |
|
| 24 |
-
## Model Architecture
|
| 25 |
|
| 26 |
**Architecture Type:** Convolutional Neural Network (CNN) <br>
|
| 27 |
**Network Architecture:** MarbleNet <br>
|
| 28 |
|
| 29 |
**This model has 91.5K of model parameters** <br>
|
| 30 |
|
| 31 |
-
|
| 32 |
**Input Type(s):** Audio <br>
|
| 33 |
**Input Format:** .wav files <br>
|
| 34 |
**Input Parameters:** 1D <br>
|
| 35 |
**Other Properties Related to Input:** 16000 Hz Mono-channel Audio, Pre-Processing Not Needed <br>
|
| 36 |
|
| 37 |
-
|
| 38 |
**Output Type(s):** Sequence of speech probabilities for each 20 millisecond frame <br>
|
| 39 |
**Output Format:** Float Array <br>
|
| 40 |
**Output Parameters:** 1D <br>
|
|
@@ -52,7 +52,6 @@ TODO
|
|
| 52 |
**Runtime Engine(s):**
|
| 53 |
* NeMo-2.0.0 <br>
|
| 54 |
|
| 55 |
-
|
| 56 |
**Supported Hardware Microarchitecture Compatibility:** <br>
|
| 57 |
* [NVIDIA Ampere] <br>
|
| 58 |
* [NVIDIA Blackwell] <br>
|
|
|
|
| 16 |
Use Case: Developers, speech processing engineers, and AI researchers will use it as the first step for other speech processing models. <br>
|
| 17 |
|
| 18 |
|
| 19 |
+
## References:
|
| 20 |
[1] Jia, Fei, Somshubra Majumdar, and Boris Ginsburg. "MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. <br>
|
| 21 |
[2] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
| 22 |
<br>
|
| 23 |
|
| 24 |
+
## Model Architecture:
|
| 25 |
|
| 26 |
**Architecture Type:** Convolutional Neural Network (CNN) <br>
|
| 27 |
**Network Architecture:** MarbleNet <br>
|
| 28 |
|
| 29 |
**This model has 91.5K of model parameters** <br>
|
| 30 |
|
| 31 |
+
## Input: <br>
|
| 32 |
**Input Type(s):** Audio <br>
|
| 33 |
**Input Format:** .wav files <br>
|
| 34 |
**Input Parameters:** 1D <br>
|
| 35 |
**Other Properties Related to Input:** 16000 Hz Mono-channel Audio, Pre-Processing Not Needed <br>
|
| 36 |
|
| 37 |
+
## Output: <br>
|
| 38 |
**Output Type(s):** Sequence of speech probabilities for each 20 millisecond frame <br>
|
| 39 |
**Output Format:** Float Array <br>
|
| 40 |
**Output Parameters:** 1D <br>
|
|
|
|
| 52 |
**Runtime Engine(s):**
|
| 53 |
* NeMo-2.0.0 <br>
|
| 54 |
|
|
|
|
| 55 |
**Supported Hardware Microarchitecture Compatibility:** <br>
|
| 56 |
* [NVIDIA Ampere] <br>
|
| 57 |
* [NVIDIA Blackwell] <br>
|