my suggestions after evaluation

#41

by Hansen-Wu - opened 17 days ago

17 days ago

I used vllm to run this model. After struggling, I made to work.
Couple things I didn't enjoy:

consolidated.safetensors is mandatory. (many other models work without it.)
It transcribes mono audio only. I have to convert stereo audio files into mono first. phi-4-multimodal-instruct model transcribes both stereo and mono.

8 days ago

Has the model's training code been open-sourced?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment