junnei
/

Phi-4-multimodal-instruct-ko-asr

Automatic Speech Recognition

text-generation

Model card Files Files and versions Community

junnei commited on Mar 13

Commit

3aae26f

·

verified ·

1 Parent(s): aef7611

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ model-index:
       value: 94.837
     - type: cer
       name: zeroth-test-CER
-      value: 1.429
     - type: wer
       name: zeroth-test-WER
       value: 2.951
@@ -53,7 +53,7 @@ This model is fine-tuned from [microsoft/Phi-4-multimodal-instruct](https://hugg
 This model is trained 960 steps on datasets for Korean Audio Speech Recognition on H100.
-After that, we will check if it can perform scalable work through additional training with synthetic data from CoVoST2 Dataset into Korean.
 ## Evaluation
@@ -70,7 +70,7 @@ Compared to [Phi-4-mm-inst-zeroth-kor](https://huggingface.co/seastar105/Phi-4-m
 | original             |  198.32     |    -      | 5.63         | 2.42             | 6.86         | 4.17             |
 | daekeun-ml/Phi-4-multimodal-finetune-ko-speech|  1.61       |     3.54     | 7.67         | 8.38             | 12.31        | 9.69             |
 | seastar105/Phi-4-mm-inst-zeroth-kor |  7.02       |     -      |7.07         | 9.19             | 13.08        | 9.35             |
-| **ASR finetune (this model)**|  **1.31**       |      -     |7.46         | 6.24             | 12.15        | 8.91             |
 | + 1 epoch finetune with [Covost-Ko](https://huggingface.co/datasets/junnei/covost2-ko)|  3.88       |   -       | **8.07**         | **10.09**            | **18.82**        | **15.41**             |
 | [**AST finetuned model**](https://huggingface.co/junnei/Phi-4-multimodal-instruct-ko-speech/tree/main)|  **1.77**       | **2.99**         | **8.01**             | **9.09**        | **17.09**             | **11.82** |

       value: 94.837
     - type: cer
       name: zeroth-test-CER
+      value: 1.316
     - type: wer
       name: zeroth-test-WER
       value: 2.951
 This model is trained 960 steps on datasets for Korean Audio Speech Recognition on H100.
+After that, we continue training with [CoVoST2 Dataset](https://huggingface.co/datasets/junnei/covost2) / [Only for Korean](https://huggingface.co/datasets/junnei/covost2-ko) for AST.
 ## Evaluation
 | original             |  198.32     |    -      | 5.63         | 2.42             | 6.86         | 4.17             |
 | daekeun-ml/Phi-4-multimodal-finetune-ko-speech|  1.61       |     3.54     | 7.67         | 8.38             | 12.31        | 9.69             |
 | seastar105/Phi-4-mm-inst-zeroth-kor |  7.02       |     -      |7.07         | 9.19             | 13.08        | 9.35             |
+| **ASR finetune (this model)**|  **1.31**       |      2.95     |7.46         | 6.24             | 12.15        | 8.91             |
 | + 1 epoch finetune with [Covost-Ko](https://huggingface.co/datasets/junnei/covost2-ko)|  3.88       |   -       | **8.07**         | **10.09**            | **18.82**        | **15.41**             |
 | [**AST finetuned model**](https://huggingface.co/junnei/Phi-4-multimodal-instruct-ko-speech/tree/main)|  **1.77**       | **2.99**         | **8.01**             | **9.09**        | **17.09**             | **11.82** |