|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
tags: |
|
- medical |
|
- pathology |
|
- vision-language |
|
- histopathology |
|
- multimodal |
|
- report-generation |
|
- computational-pathology |
|
- melanoma |
|
- dermatopathology |
|
library_name: mosaic |
|
base_model: microsoft/biogpt |
|
arxiv: 2502.19293 |
|
--- |
|
|
|
# MOSAIC Model Checkpoints |
|
|
|
**MOSAIC** (Multimodal Optical Slide Analysis Including Comparisons) is a framework for training and inferencing vision-language models for computational pathology. These pre-trained models are from the paper "Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions" by Lucassen et al. (2025), which was accepted at MICCAI 2025. |
|
|
|
## Model Variants |
|
|
|
This repository contains three pre-trained model checkpoints: |
|
|
|
- **`mosaic-perceiver-biogpt-lora.pt`** - LoRA fine-tuned model (recommended for most use cases) |
|
- **`mosaic-perceiver-biogpt-frozen.pt`** - Frozen backbone model |
|
- **`mosaic-perceiver-biogpt-unfrozen.pt`** - Fully fine-tuned model |
|
|
|
## Quick Start |
|
|
|
### Installation |
|
|
|
First, install the MOSAIC framework from the source repository: |
|
|
|
```bash |
|
git clone https://github.com/SanderMoon/MOSAIC.git |
|
cd MOSAIC |
|
pip install -e . |
|
pip install git+https://github.com/salaniz/pycocoevalcap.git |
|
``` |
|
|
|
### Download Model Checkpoints |
|
|
|
```bash |
|
# Set your Hugging Face token (required for access) |
|
export HF_TOKEN=your_huggingface_token_here |
|
|
|
# Install huggingface_hub CLI |
|
pip install huggingface_hub[cli] |
|
|
|
# Download the LoRA model (change filename for other models) |
|
huggingface-cli download SaltySander/MOSAIC checkpoints/mosaic-perceiver-biogpt-lora.pt --local-dir . --local-dir-use-symlinks False |
|
``` |
|
|
|
### Inference Example |
|
|
|
```python |
|
from mosaic.model_factory import create_model, load_pretrained |
|
import torch |
|
import os |
|
|
|
# Model configuration |
|
model_name = "coca_stage_2_perceiver_lora_uni" # Use appropriate config for your model |
|
pretrained_path = "checkpoints/mosaic-perceiver-biogpt-lora.pt" |
|
device = "cpu" # or "cuda" if available |
|
|
|
# Create model and tokenizer |
|
model, tokenizer, amp, input_dtype = create_model( |
|
model_name=model_name, |
|
pretrained=None, |
|
precision="bf16", |
|
device=device, |
|
init_tokenizer=True, |
|
) |
|
|
|
# Load pretrained weights |
|
load_pretrained(model, pretrained=pretrained_path, device=device) |
|
|
|
def load_features_from_pth(file_path: str) -> torch.Tensor: |
|
"""Load features from a .pth file with nested dictionary structure.""" |
|
data = torch.load(file_path, map_location=device) |
|
features_list = [] |
|
|
|
for level_key in data.keys(): |
|
level_data = data[level_key] |
|
for patch_id in sorted(level_data.keys()): |
|
if "feature" in level_data[patch_id]: |
|
feature = level_data[patch_id]["feature"] |
|
if not isinstance(feature, torch.Tensor): |
|
feature = torch.tensor(feature) |
|
features_list.append(feature.to(device)) |
|
|
|
if features_list: |
|
stacked_features = torch.stack(features_list, dim=0) |
|
return stacked_features.unsqueeze(0) |
|
else: |
|
raise ValueError(f"No features found in {file_path}") |
|
|
|
# Generation parameters |
|
generation_params = { |
|
"seq_len": 128, |
|
"max_seq_len": 128, |
|
"temperature": 1.0, |
|
"generation_type": "top_k", |
|
"top_k": 1, |
|
"min_seq_len": 5, |
|
"repetition_penalty": 1.1, |
|
} |
|
|
|
# Process slide features (example) |
|
slide_path = "path/to/your/slide_features.pth" |
|
visual_features = load_features_from_pth(slide_path) |
|
|
|
model.eval() |
|
with torch.no_grad(): |
|
# Generate pathology report |
|
generated_ids = model.generate( |
|
image=visual_features, |
|
sot_token_id=tokenizer.all_special_ids[0], |
|
eos_token_id=tokenizer.all_special_ids[1], |
|
pad_token_id=tokenizer.all_special_ids[3], |
|
**generation_params, |
|
) |
|
|
|
# Decode generated text |
|
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) |
|
print(f"Generated Report: {generated_text.strip()}") |
|
``` |
|
|
|
## Model Configuration Mapping |
|
|
|
Use the appropriate model configuration for each checkpoint: |
|
|
|
- **LoRA model**: `coca_stage_2_perceiver_lora_uni` |
|
- **Frozen model**: `coca_stage_2_perceiver_frozen_uni` |
|
- **Unfrozen model**: `coca_stage_2_perceiver_unfrozen_uni` |
|
|
|
## Requirements |
|
|
|
- Python >= 3.10 |
|
- PyTorch >= 2.0 |
|
- transformers |
|
- CUDA-compatible GPU (recommended, but CPU is supported) |
|
|
|
## Source Code |
|
|
|
The complete source code, training scripts, and documentation are available at: |
|
**https://github.com/SanderMoon/MOSAIC** |
|
|
|
## Citation |
|
|
|
If you use these models in your research, please cite our paper: |
|
|
|
```bibtex |
|
@misc{lucassen2025pathologyreportgenerationmultimodal, |
|
title={Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions}, |
|
author={Ruben T. Lucassen and Sander P. J. Moonemans and Tijn van de Luijtgaarden and Gerben E. Breimer and Willeke A. M. Blokx and Mitko Veta}, |
|
year={2025}, |
|
eprint={2502.19293}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
url={https://arxiv.org/abs/2502.19293}, |
|
} |
|
``` |
|
|
|
## License |
|
|
|
This project is licensed under the Apache License 2.0. See the [LICENSE](https://github.com/SanderMoon/MOSAIC/blob/main/LICENSE) file in the source repository for details. |
|
|
|
## Contact |
|
|
|
For questions or support, please contact: |
|
- Sander Moonemans: <[email protected]> |
|
|
|
--- |
|
|
|
*This work was developed as part of research into computational pathology and vision-language models for medical image analysis.* |
|
|