asahi417 commited on
Commit
0774a64
·
verified ·
1 Parent(s): 7ebe210

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -1
README.md CHANGED
@@ -41,7 +41,26 @@ ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
41
  ```
42
 
43
  ### Benchmark
44
- Please refer to the [kotoba-tech/kotoba-whisper-v1.0-ggml](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml) for the detail of speed up [here](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml#benchmark).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ### Quantized Model
47
  To use the quantized model, download the quantized GGML weights:
 
41
  ```
42
 
43
  ### Benchmark
44
+ We measure the inference speed of different kotoba-whisper-v2.0 implementations with four different Japanese speech audio on MacBook Pro with the following spec:
45
+ - Apple M2 Pro
46
+ - 32GB
47
+ - 14-inch, 2023
48
+ - OS Sonoma Version 14.4.1 (23E224)
49
+
50
+ | audio file | audio duration (min)| [whisper.cpp](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0-ggml) (sec) | [faster-whisper](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0-faster) (sec)| [hf pipeline](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) (sec)
51
+ |--------|------|-----|------|-----|
52
+ |audio 1 | 50.3 | 581 | 2601 | 807 |
53
+ |audio 2 | 5.6 | 41 | 73 | 61 |
54
+ |audio 3 | 4.9 | 30 | 141 | 54 |
55
+ |audio 4 | 5.6 | 35 | 126 | 69 |
56
+
57
+ Scripts to re-run the experiment can be found bellow:
58
+ * [whisper.cpp](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0-ggml/blob/main/benchmark.sh)
59
+ * [faster-whisper](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0-faster/blob/main/benchmark.sh)
60
+ * [hf pipeline](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0/blob/main/benchmark.sh)
61
+ Also, currently whisper.cpp and faster-whisper support the [sequential long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#sequential-long-form),
62
+ and only Huggingface pipeline supports the [chunked long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#chunked-long-form), which we empirically
63
+ found better than the sequnential long-form decoding.
64
 
65
  ### Quantized Model
66
  To use the quantized model, download the quantized GGML weights: