Update README.md
Browse files
README.md
CHANGED
|
@@ -67,13 +67,17 @@ These libraries are merged into Kotoba-Whisper-v1.1 via pipeline and will be app
|
|
| 67 |
The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
|
| 68 |
|
| 69 |
|
| 70 |
-
Following table presents the raw CER (unlike usual CER where the punctuations are removed before computing the metrics)
|
|
|
|
| 71 |
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
| kotoba-tech/kotoba-whisper-v1.
|
| 76 |
-
|
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
|
| 79 |
## Transformers Usage
|
|
@@ -111,7 +115,9 @@ pipe = pipeline(
|
|
| 111 |
model_kwargs=model_kwargs,
|
| 112 |
chunk_length_s=15,
|
| 113 |
batch_size=16,
|
| 114 |
-
trust_remote_code=True
|
|
|
|
|
|
|
| 115 |
)
|
| 116 |
|
| 117 |
# load sample audio
|
|
@@ -129,6 +135,18 @@ print(result)
|
|
| 129 |
+ result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
|
| 130 |
```
|
| 131 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
### Transcription with Prompt
|
| 133 |
Kotoba-whisper can generate transcription with prompting as below:
|
| 134 |
|
|
|
|
| 67 |
The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
|
| 68 |
|
| 69 |
|
| 70 |
+
Following table presents the raw CER (unlike usual CER where the punctuations are removed before computing the metrics, see the evaluation script [here](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1/blob/main/run_short_form_eval.py))
|
| 71 |
+
along with the.
|
| 72 |
|
| 73 |
+
|
| 74 |
+
| model | CommonVoice 8.0 (Japanese) | JSUT Basic 5000 | ReazonSpeech Test |
|
| 75 |
+
|:---------------------------------------------------------|---------------------------------------:|-------------------------------------:|----------------------------------------:|
|
| 76 |
+
| kotoba-tech/kotoba-whisper-v1.0 | 17.8 | 15.2 | **17.8** |
|
| 77 |
+
| kotoba-tech/kotoba-whisper-v1.1 (punctuator + stable-ts) | 16.0 | **11.7** | 18.5 |
|
| 78 |
+
| kotoba-tech/kotoba-whisper-v1.1 (punctuator) | 16.0 | **11.7** | 18.5 |
|
| 79 |
+
| kotoba-tech/kotoba-whisper-v1.1 (stable-ts) | 17.8 | 15.2 | **17.8** |
|
| 80 |
+
| openai/whisper-large-v3 | **15.2** | 13.4 | 20.6 |
|
| 81 |
|
| 82 |
|
| 83 |
## Transformers Usage
|
|
|
|
| 115 |
model_kwargs=model_kwargs,
|
| 116 |
chunk_length_s=15,
|
| 117 |
batch_size=16,
|
| 118 |
+
trust_remote_code=True,
|
| 119 |
+
stable_ts=True,
|
| 120 |
+
punctuator=True
|
| 121 |
)
|
| 122 |
|
| 123 |
# load sample audio
|
|
|
|
| 135 |
+ result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
|
| 136 |
```
|
| 137 |
|
| 138 |
+
- To deactivate stable-ts:
|
| 139 |
+
```diff
|
| 140 |
+
- stable_ts=True,
|
| 141 |
+
+ stable_ts=False,
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
- To deactivate punctuator:
|
| 145 |
+
```diff
|
| 146 |
+
- punctuator=True,
|
| 147 |
+
+ punctuator=False,
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
### Transcription with Prompt
|
| 151 |
Kotoba-whisper can generate transcription with prompting as below:
|
| 152 |
|