my suggestions after evaluation
#41
by
Hansen-Wu
- opened
I used vllm to run this model. After struggling, I made to work.
Couple things I didn't enjoy:
- consolidated.safetensors is mandatory. (many other models work without it.)
- It transcribes mono audio only. I have to convert stereo audio files into mono first. phi-4-multimodal-instruct model transcribes both stereo and mono.
Has the model's training code been open-sourced?