metadata
library_name: transformers
license: apache-2.0
tags:
- vision
- image-captioning
- blip
- multimodal
- fashion
datasets:
- Marqo/fashion200k
base_model:
- Salesforce/blip-image-captioning-large
Fine-Tuned BLIP Model for Fashion Image Captioning
This is a fine-tuned BLIP (Bootstrapped Language-Image Pretraining) model specifically designed for fashion image captioning. It was fine-tuned on the Marqo Fashion Dataset to generate descriptive and contextually relevant captions for fashion-related images.
Model Details
- Model Type: BLIP (Vision-Language Pretraining)
- Architecture: BLIP uses a multimodal transformer architecture to jointly model visual and textual information.
- Fine-Tuning Dataset: Marqo Fashion Dataset (a dataset containing fashion images and corresponding captions)
- Task: Fashion Image Captioning
- License: Apache 2.0
Usage
You can use this model with the Hugging Face transformers
library for fashion image captioning tasks.
Installation
First, install the required libraries:
pip install transformers torch