MOSAIC / README.md

Update README.md

7fd12ea verified 4 months ago

5.44 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- medical
	- pathology
	- vision-language
	- histopathology
	- multimodal
	- report-generation
	- computational-pathology
	- melanoma
	- dermatopathology
	library_name: mosaic
	base_model: microsoft/biogpt
	arxiv: 2502.19293
	---

	# MOSAIC Model Checkpoints

	MOSAIC (Multimodal Optical Slide Analysis Including Comparisons) is a framework for training and inferencing vision-language models for computational pathology. These pre-trained models are from the paper "Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions" by Lucassen et al. (2025), which was accepted at MICCAI 2025.

	## Model Variants

	This repository contains three pre-trained model checkpoints:

	- `mosaic-perceiver-biogpt-lora.pt` - LoRA fine-tuned model (recommended for most use cases)
	- `mosaic-perceiver-biogpt-frozen.pt` - Frozen backbone model
	- `mosaic-perceiver-biogpt-unfrozen.pt` - Fully fine-tuned model

	## Quick Start

	### Installation

	First, install the MOSAIC framework from the source repository:

	```bash
	git clone https://github.com/SanderMoon/MOSAIC.git
	cd MOSAIC
	pip install -e .
	pip install git+https://github.com/salaniz/pycocoevalcap.git
	```

	### Download Model Checkpoints

	```bash
	# Set your Hugging Face token (required for access)
	export HF_TOKEN=your_huggingface_token_here

	# Install huggingface_hub CLI
	pip install huggingface_hub[cli]

	# Download the LoRA model (change filename for other models)
	huggingface-cli download SaltySander/MOSAIC checkpoints/mosaic-perceiver-biogpt-lora.pt --local-dir . --local-dir-use-symlinks False
	```

	### Inference Example

	```python
	from mosaic.model_factory import create_model, load_pretrained
	import torch
	import os

	# Model configuration
	model_name = "coca_stage_2_perceiver_lora_uni" # Use appropriate config for your model
	pretrained_path = "checkpoints/mosaic-perceiver-biogpt-lora.pt"
	device = "cpu" # or "cuda" if available

	# Create model and tokenizer
	model, tokenizer, amp, input_dtype = create_model(
	model_name=model_name,
	pretrained=None,
	precision="bf16",
	device=device,
	init_tokenizer=True,
	)

	# Load pretrained weights
	load_pretrained(model, pretrained=pretrained_path, device=device)

	def load_features_from_pth(file_path: str) -> torch.Tensor:
	"""Load features from a .pth file with nested dictionary structure."""
	data = torch.load(file_path, map_location=device)
	features_list = []

	for level_key in data.keys():
	level_data = data[level_key]
	for patch_id in sorted(level_data.keys()):
	if "feature" in level_data[patch_id]:
	feature = level_data[patch_id]["feature"]
	if not isinstance(feature, torch.Tensor):
	feature = torch.tensor(feature)
	features_list.append(feature.to(device))

	if features_list:
	stacked_features = torch.stack(features_list, dim=0)
	return stacked_features.unsqueeze(0)
	else:
	raise ValueError(f"No features found in {file_path}")

	# Generation parameters
	generation_params = {
	"seq_len": 128,
	"max_seq_len": 128,
	"temperature": 1.0,
	"generation_type": "top_k",
	"top_k": 1,
	"min_seq_len": 5,
	"repetition_penalty": 1.1,
	}

	# Process slide features (example)
	slide_path = "path/to/your/slide_features.pth"
	visual_features = load_features_from_pth(slide_path)

	model.eval()
	with torch.no_grad():
	# Generate pathology report
	generated_ids = model.generate(
	image=visual_features,
	sot_token_id=tokenizer.all_special_ids[0],
	eos_token_id=tokenizer.all_special_ids[1],
	pad_token_id=tokenizer.all_special_ids[3],
	**generation_params,
	)

	# Decode generated text
	generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
	print(f"Generated Report: {generated_text.strip()}")
	```

	## Model Configuration Mapping

	Use the appropriate model configuration for each checkpoint:

	- LoRA model: `coca_stage_2_perceiver_lora_uni`
	- Frozen model: `coca_stage_2_perceiver_frozen_uni`
	- Unfrozen model: `coca_stage_2_perceiver_unfrozen_uni`

	## Requirements

	- Python >= 3.10
	- PyTorch >= 2.0
	- transformers
	- CUDA-compatible GPU (recommended, but CPU is supported)

	## Source Code

	The complete source code, training scripts, and documentation are available at:
	https://github.com/SanderMoon/MOSAIC

	## Citation

	If you use these models in your research, please cite our paper:

	```bibtex
	@misc{lucassen2025pathologyreportgenerationmultimodal,
	title={Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions},
	author={Ruben T. Lucassen and Sander P. J. Moonemans and Tijn van de Luijtgaarden and Gerben E. Breimer and Willeke A. M. Blokx and Mitko Veta},
	year={2025},
	eprint={2502.19293},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2502.19293},
	}
	```

	## License

	This project is licensed under the Apache License 2.0. See the [LICENSE](https://github.com/SanderMoon/MOSAIC/blob/main/LICENSE) file in the source repository for details.

	## Contact

	For questions or support, please contact:
	- Sander Moonemans: <[email protected]>

	---

	This work was developed as part of research into computational pathology and vision-language models for medical image analysis.