my suggestions after evaluation

#41
by Hansen-Wu - opened

I used vllm to run this model. After struggling, I made to work.
Couple things I didn't enjoy:

  1. consolidated.safetensors is mandatory. (many other models work without it.)
  2. It transcribes mono audio only. I have to convert stereo audio files into mono first. phi-4-multimodal-instruct model transcribes both stereo and mono.

Has the model's training code been open-sourced?

Sign up or log in to comment