MOSAIC / README.md
SaltySander's picture
Update README.md
7fd12ea verified
---
language:
- en
license: apache-2.0
tags:
- medical
- pathology
- vision-language
- histopathology
- multimodal
- report-generation
- computational-pathology
- melanoma
- dermatopathology
library_name: mosaic
base_model: microsoft/biogpt
arxiv: 2502.19293
---
# MOSAIC Model Checkpoints
**MOSAIC** (Multimodal Optical Slide Analysis Including Comparisons) is a framework for training and inferencing vision-language models for computational pathology. These pre-trained models are from the paper "Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions" by Lucassen et al. (2025), which was accepted at MICCAI 2025.
## Model Variants
This repository contains three pre-trained model checkpoints:
- **`mosaic-perceiver-biogpt-lora.pt`** - LoRA fine-tuned model (recommended for most use cases)
- **`mosaic-perceiver-biogpt-frozen.pt`** - Frozen backbone model
- **`mosaic-perceiver-biogpt-unfrozen.pt`** - Fully fine-tuned model
## Quick Start
### Installation
First, install the MOSAIC framework from the source repository:
```bash
git clone https://github.com/SanderMoon/MOSAIC.git
cd MOSAIC
pip install -e .
pip install git+https://github.com/salaniz/pycocoevalcap.git
```
### Download Model Checkpoints
```bash
# Set your Hugging Face token (required for access)
export HF_TOKEN=your_huggingface_token_here
# Install huggingface_hub CLI
pip install huggingface_hub[cli]
# Download the LoRA model (change filename for other models)
huggingface-cli download SaltySander/MOSAIC checkpoints/mosaic-perceiver-biogpt-lora.pt --local-dir . --local-dir-use-symlinks False
```
### Inference Example
```python
from mosaic.model_factory import create_model, load_pretrained
import torch
import os
# Model configuration
model_name = "coca_stage_2_perceiver_lora_uni" # Use appropriate config for your model
pretrained_path = "checkpoints/mosaic-perceiver-biogpt-lora.pt"
device = "cpu" # or "cuda" if available
# Create model and tokenizer
model, tokenizer, amp, input_dtype = create_model(
model_name=model_name,
pretrained=None,
precision="bf16",
device=device,
init_tokenizer=True,
)
# Load pretrained weights
load_pretrained(model, pretrained=pretrained_path, device=device)
def load_features_from_pth(file_path: str) -> torch.Tensor:
"""Load features from a .pth file with nested dictionary structure."""
data = torch.load(file_path, map_location=device)
features_list = []
for level_key in data.keys():
level_data = data[level_key]
for patch_id in sorted(level_data.keys()):
if "feature" in level_data[patch_id]:
feature = level_data[patch_id]["feature"]
if not isinstance(feature, torch.Tensor):
feature = torch.tensor(feature)
features_list.append(feature.to(device))
if features_list:
stacked_features = torch.stack(features_list, dim=0)
return stacked_features.unsqueeze(0)
else:
raise ValueError(f"No features found in {file_path}")
# Generation parameters
generation_params = {
"seq_len": 128,
"max_seq_len": 128,
"temperature": 1.0,
"generation_type": "top_k",
"top_k": 1,
"min_seq_len": 5,
"repetition_penalty": 1.1,
}
# Process slide features (example)
slide_path = "path/to/your/slide_features.pth"
visual_features = load_features_from_pth(slide_path)
model.eval()
with torch.no_grad():
# Generate pathology report
generated_ids = model.generate(
image=visual_features,
sot_token_id=tokenizer.all_special_ids[0],
eos_token_id=tokenizer.all_special_ids[1],
pad_token_id=tokenizer.all_special_ids[3],
**generation_params,
)
# Decode generated text
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Generated Report: {generated_text.strip()}")
```
## Model Configuration Mapping
Use the appropriate model configuration for each checkpoint:
- **LoRA model**: `coca_stage_2_perceiver_lora_uni`
- **Frozen model**: `coca_stage_2_perceiver_frozen_uni`
- **Unfrozen model**: `coca_stage_2_perceiver_unfrozen_uni`
## Requirements
- Python >= 3.10
- PyTorch >= 2.0
- transformers
- CUDA-compatible GPU (recommended, but CPU is supported)
## Source Code
The complete source code, training scripts, and documentation are available at:
**https://github.com/SanderMoon/MOSAIC**
## Citation
If you use these models in your research, please cite our paper:
```bibtex
@misc{lucassen2025pathologyreportgenerationmultimodal,
title={Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions},
author={Ruben T. Lucassen and Sander P. J. Moonemans and Tijn van de Luijtgaarden and Gerben E. Breimer and Willeke A. M. Blokx and Mitko Veta},
year={2025},
eprint={2502.19293},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.19293},
}
```
## License
This project is licensed under the Apache License 2.0. See the [LICENSE](https://github.com/SanderMoon/MOSAIC/blob/main/LICENSE) file in the source repository for details.
## Contact
For questions or support, please contact:
- Sander Moonemans: <[email protected]>
---
*This work was developed as part of research into computational pathology and vision-language models for medical image analysis.*