metadata
			base_model:
  - openai/whisper-small
Note: This classifier also contains fine-tuned whisper-small weights in its state dict. It will be properly loaded by my model wrapper.
Result of the classifier Rob's human-annotated dataset (data/voicemail_human_eval.csv):
Results for chunk size 1 seconds:
- Accuracy: 0.8080
 - Precision: 0.9353
 - Recall: 0.7692
 - F1 Score: 0.8442
 
Results for chunk size 2 seconds:
- Accuracy: 0.8560
 - Precision: 0.9650
 - Recall: 0.8166
 - F1 Score: 0.8846
 
Results for chunk size 5 seconds:
- Accuracy: 0.8640
 - Precision: 0.9856
 - Recall: 0.8107
 - F1 Score: 0.8896
 
Results for chunk size 10 seconds:
- Accuracy: 0.8760
 - Precision: 1.0000
 - Recall: 0.8166
 - F1 Score: 0.8990
 
Results for full audio samples:
- Accuracy: 0.8760
 - Precision: 1.0000
 - Recall: 0.8166
 - F1 Score: 0.8990