File size: 3,522 Bytes
71b8e11 71d4535 71b8e11 71d4535 71b8e11 71d4535 71b8e11 788ccb1 ccb63d2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
---
license: mit
tags:
- chest-xray
- medical
- multimodal
- retrieval
- explanation
- clinicalbert
- swin-transformer
- deep-learning
- image-text
datasets:
- openi
language:
- en
---
# Multimodal Chest X-ray Retrieval & Diagnosis (ClinicalBERT + Swin)
This model jointly encodes chest X-rays (DICOM) and radiology reports (XML) to:
- Predict medical conditions from multimodal input (image + text)
- Retrieve similar cases using shared disease-aware embeddings
- Provide visual explanations using attention and Integrated Gradients (IG)
> Developed as a final project at HCMUS.
---
## Model Architecture
- **Image Encoder:** Swin Transformer (pretrained, fine-tuned)
- **Text Encoder:** ClinicalBERT
- **Fusion Module:** Cross-modal attention with optional hybrid FFN layers
- **Losses:** BCE + Focal Loss for multi-label classification
Embeddings from both modalities are projected into a **shared joint space**, enabling retrieval and explanation.
---
## Training Data
- **Dataset:** [NIH Open-i Chest X-ray Dataset](https://openi.nlm.nih.gov/)
- **Input Modalities:**
- Chest X-ray DICOMs
- Associated XML radiology reports
- **Labels:** MeSH-derived disease categories (multi-label)
---
## Intended Uses
* Clinical Education: Case similarity search for radiology students
* Research: Baseline for multimodal medical retrieval
* Explainability: Visualize disease evidence in both image and text
## Model Performance
### Classification
The model was evaluated on a held-out **evaluation set** and a **separate test set** across 22 disease labels. Performance metrics include **Precision (Prec)**, **Recall (Rec)**, **F1-score**, and **AUROC**.
| Metric | Eval Set (Macro Avg) | Test Set (Macro Avg) |
|--------|--------------------|--------------------|
| Precision | 0.826 | 0.825 |
| Recall | 0.829 | 0.812 |
| F1-score | 0.825 | 0.800 |
| AUROC | 0.924 | 0.943 |
*The model achieves strong label-level performance, particularly on common findings such as COPD, Cardiomegaly, and Musculoskeletal degenerative diseases. Rare conditions such as Air Leak Syndromes show lower F1 scores, reflecting data imbalance.*
---
### Retrieval Performance
Retrieval was evaluated under two protocols:
| Protocol | P@5 | mAP | MRR | Avg Time (ms) |
|----------|-----|-----|-----|---------------|
| Generalization (test → test) | 0.776 | 0.0058 | 0.848 | 0.99 |
| Historical (test → train) | 0.794 | 0.0008 | 0.881 | 2.19 |
#### Retrieval Diversity
| Metric | Mean | Std. Dev | Median |
|--------|------|----------|--------|
| Retrieval Diversity Score | 0.217 | 0.041 | 0.222 |
| Retrieval Overlap IoU@5 | 0.783 | 0.041 | 0.778 |
*The model retrieves diverse and relevant cases, enabling multimodal explanation and case-based reasoning for clinical education.*
---
### Notes
- Retrieval and diversity metrics highlight the model’s ability to surface multiple relevant cases per query.
- Lower performance on some rare labels may reflect dataset imbalance in Open-i.
---
## Limitations & Risks
* Trained on a public dataset (Open-i) — may not generalize to other hospitals
* Explanations are not clinically validated
* Not for diagnostic use in real-world settings
---
## Acknowledgments
* [NIH Open-i Dataset](https://openi.nlm.nih.gov/faq#collection)
* Swin Transformer (Timm)
* ClinicalBERT (Emily Alsentzer)
* Captum (for IG explanations)
* Gam-CAM
## Code link: [GitHub](https://github.com/ppddddpp/multi-modal-retrieval-predict-project)
|