Automatic Speech Recognition
Transformers
Safetensors
Japanese
whisper
audio
hf-asr-leaderboard
asahi417 commited on
Commit
e8e6e06
·
verified ·
1 Parent(s): ba3af7b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -15
README.md CHANGED
@@ -26,7 +26,7 @@ datasets:
26
  # Kotoba-Whisper-v1.1
27
  _Kotoba-Whisper-v1.1_ is a Japanese ASR model based on [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0), with
28
  additional postprocessing stacks integrated as [`pipeline`](https://huggingface.co/docs/transformers/en/main_classes/pipelines). The new features includes
29
- (i) improved timestamp achieved by [stable-ts](https://github.com/jianfch/stable-ts) and (ii) adding punctuation with [punctuators](https://github.com/1-800-BAD-CODE/punctuators/tree/main).
30
  These libraries are merged into Kotoba-Whisper-v1.1 via pipeline and will be applied seamlessly to the predicted transcription from [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0).
31
  The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
32
 
@@ -38,15 +38,9 @@ along with the.
38
  | model | [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
39
  |:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
40
  | [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) | 17.6 | 15.4 | 17.4 |
41
- | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) | 17.7 | 15.4 | 17 |
42
- | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (punctuator + stable-ts) | 17.7 | 15.4 | 17 |
43
- | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (punctuator) | 17.7 | 15.4 | 17 |
44
- | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (stable-ts) | 17.7 | 15.4 | 17 |
45
  | [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) | 17.8 | 15.2 | 17.8 |
46
  | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) | 17.9 | 15 | 17.8 |
47
- | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (punctuator + stable-ts) | 17.9 | 15 | 17.8 |
48
- | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (punctuator) | 17.9 | 15 | 17.8 |
49
- | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (stable-ts) | 17.9 | 15 | 17.8 |
50
  | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 15.3 | 13.4 | 20.5 |
51
  | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 15.9 | 10.6 | 34.6 |
52
  | [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 16.6 | 11.3 | 40.7 |
@@ -111,7 +105,6 @@ pipe = pipeline(
111
  chunk_length_s=15,
112
  batch_size=16,
113
  trust_remote_code=True,
114
- stable_ts=True,
115
  punctuator=True
116
  )
117
 
@@ -130,12 +123,6 @@ print(result)
130
  + result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
131
  ```
132
 
133
- - To deactivate stable-ts:
134
- ```diff
135
- - stable_ts=True,
136
- + stable_ts=False,
137
- ```
138
-
139
  - To deactivate punctuator:
140
  ```diff
141
  - punctuator=True,
 
26
  # Kotoba-Whisper-v1.1
27
  _Kotoba-Whisper-v1.1_ is a Japanese ASR model based on [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0), with
28
  additional postprocessing stacks integrated as [`pipeline`](https://huggingface.co/docs/transformers/en/main_classes/pipelines). The new features includes
29
+ adding punctuation with [punctuators](https://github.com/1-800-BAD-CODE/punctuators/tree/main).
30
  These libraries are merged into Kotoba-Whisper-v1.1 via pipeline and will be applied seamlessly to the predicted transcription from [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0).
31
  The pipeline has been developed through the collaboration between [Asahi Ushio](https://asahiushio.com) and [Kotoba Technologies](https://twitter.com/kotoba_tech)
32
 
 
38
  | model | [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
39
  |:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
40
  | [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) | 17.6 | 15.4 | 17.4 |
41
+ | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) | 17.7 | 15.4 | 17 |
 
 
 
42
  | [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) | 17.8 | 15.2 | 17.8 |
43
  | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) | 17.9 | 15 | 17.8 |
 
 
 
44
  | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 15.3 | 13.4 | 20.5 |
45
  | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 15.9 | 10.6 | 34.6 |
46
  | [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 16.6 | 11.3 | 40.7 |
 
105
  chunk_length_s=15,
106
  batch_size=16,
107
  trust_remote_code=True,
 
108
  punctuator=True
109
  )
110
 
 
123
  + result = pipe("audio.mp3", return_timestamps=True, generate_kwargs=generate_kwargs)
124
  ```
125
 
 
 
 
 
 
 
126
  - To deactivate punctuator:
127
  ```diff
128
  - punctuator=True,