ppddddpp
/

multi-modal-retrieval-predict

swin-transformer

Model card Files Files and versions

ppddddpp commited on 5 days ago

Commit

71d4535

·

verified ·

1 Parent(s): 788ccb1

Update README.md

Files changed (1) hide show

README.md +47 -1

README.md CHANGED Viewed

@@ -56,6 +56,50 @@ Embeddings from both modalities are projected into a **shared joint space**, ena
 * Explainability: Visualize disease evidence in both image and text
 ## Limitations & Risks
 * Trained on a public dataset (Open-i) — may not generalize to other hospitals
@@ -63,8 +107,10 @@ Embeddings from both modalities are projected into a **shared joint space**, ena
 * Not for diagnostic use in real-world settings
 ## Acknowledgments
-* NIH Open-i Dataset
 * Swin Transformer (Timm)

 * Explainability: Visualize disease evidence in both image and text
+## Model Performance
+### Classification
+The model was evaluated on a held-out **evaluation set** and a **separate test set** across 22 disease labels. Performance metrics include **Precision (Prec)**, **Recall (Rec)**, **F1-score**, and **AUROC**.
+| Metric | Eval Set (Macro Avg) | Test Set (Macro Avg) |
+|--------|--------------------|--------------------|
+| Precision | 0.826 | 0.825 |
+| Recall    | 0.829 | 0.812 |
+| F1-score  | 0.825 | 0.800 |
+| AUROC     | 0.924 | 0.943 |
+*The model achieves strong label-level performance, particularly on common findings such as COPD, Cardiomegaly, and Musculoskeletal degenerative diseases. Rare conditions such as Air Leak Syndromes show lower F1 scores, reflecting data imbalance.*
+---
+### Retrieval Performance
+Retrieval was evaluated under two protocols:
+| Protocol | P@5 | mAP | MRR | Avg Time (ms) |
+|----------|-----|-----|-----|---------------|
+| Generalization (test → test) | 0.776 | 0.0058 | 0.848 | 0.99 |
+| Historical (test → train)    | 0.794 | 0.0008 | 0.881 | 2.19 |
+#### Retrieval Diversity
+| Metric | Mean | Std. Dev | Median |
+|--------|------|----------|--------|
+| Retrieval Diversity Score | 0.217 | 0.041 | 0.222 |
+| Retrieval Overlap IoU@5    | 0.783 | 0.041 | 0.778 |
+*The model retrieves diverse and relevant cases, enabling multimodal explanation and case-based reasoning for clinical education.*
+---
+### Notes
+- Retrieval and diversity metrics highlight the model’s ability to surface multiple relevant cases per query.
+- Lower performance on some rare labels may reflect dataset imbalance in Open-i.
+---
 ## Limitations & Risks
 * Trained on a public dataset (Open-i) — may not generalize to other hospitals
 * Not for diagnostic use in real-world settings
+---
 ## Acknowledgments
+* [NIH Open-i Dataset](https://openi.nlm.nih.gov/faq#collection)
 * Swin Transformer (Timm)