Image captioning model finetuned on BLIP-base, responds like how Yoda speaks,
"Sitting in a car, a man is"
Try web app here: https://yodacaptioner.up.railway.app/
Model Details
Model Description
An image-to-text model finetuned on BLIP-base with the transformers package
- Developed by: vkao8264
- Model type: Image-to-text
- Language(s) (NLP): English
- License: bsd-3-clause
- Finetuned from model [optional]: blip-image-captioning-base
Uses
from PIL import Image
from transformers import AutoProcessor, BlipForConditionalGeneration
processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("vkao8264/blip-yoda-captioning")
filepath = "path-to-your-image"
raw_image = Image.open(filepath).convert('RGB')
inputs = processor(raw_image, return_tensors="pt").to("cuda")
output_tokens = model.generate(**inputs)
caption = processor.decode(output_tokens[0], skip_special_tokens=True)
print(caption)
Training Details
Training Data
The model was fine-tuned on 30000 image-caption pairs from the COCO captions dataset. Specifically, captions_train2014.
Before training, captions were changed to yoda-style captions using phi3 with few-shot learning
Scripts can be found on https://github.com/vincent8264/yoda_captioning
- Downloads last month
- 491
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for vkao8264/blip-yoda-captioning
Base model
Salesforce/blip-image-captioning-base