tifa-benchmark
/

promptcap-coco-vqa

visual-question-answering

image-captioning

Model card Files Files and versions

yushihu commited on Jan 25, 2023

Commit

68336b4

·

1 Parent(s): 502c77d

Update README.md

Files changed (1) hide show

README.md +81 -1

README.md CHANGED Viewed

@@ -15,4 +15,84 @@ datasets:
 language:
 - en
----

 language:
 - en
+---
+# QuickStart
+## Installation
+```
+pip install promptcap
+```
+## Captioning Pipeline
+Generate a prompt-guided caption by following:
+```
+import torch
+from promptcap import PromptCap
+model = PromptCap("vqascore/promptcap-coco-vqa")  # also support OFA checkpoints. e.g. "OFA-Sys/ofa-base"
+if torch.cuda.is_available():
+  model.cuda()
+prompt = "please describe this image according to the given question: what piece of clothing is this boy putting on?"
+image = "glove_boy.jpeg"
+print(model.caption(prompt, image))
+```
+To try generic captioning, just use "please describe this image according to the given question: what does the image describe?"
+PromptCap also support taking OCR inputs:
+```
+question = "what year was this taken?"
+image = "dvds.jpg"
+ocr = "yip AE Mht juor 02/14/2012"
+print(model.caption(prompt, image, ocr))
+```
+## Visual Question Answering Pipeline
+Different from typical VQA models, which are doing classification on VQAv2, PromptCap is open-domain and can be paired with arbitrary text-QA models.
+Here we provide a pipeline for combining PromptCap with UnifiedQA.
+```
+import torch
+from promptcap import PromptCap_VQA
+# QA model support all UnifiedQA variants. e.g. "allenai/unifiedqa-v2-t5-large-1251000"
+vqa_model = PromptCap_VQA(promptcap_model="vqascore/promptcap-coco-vqa", vqa_model="allenai/unifiedqa-t5-base")
+if torch.cuda.is_available():
+  vqa_model.cuda()
+question = "what piece of clothing is this boy putting on?"
+image = "glove_boy.jpeg"
+print(vqa_model.vqa(question, image))
+```
+Similarly, PromptCap supports OCR inputs
+```
+question = "what year was this taken?"
+image = "dvds.jpg"
+ocr = "yip AE Mht juor 02/14/2012"
+print(vqa_model.vqa(prompt, image, ocr=ocr))
+```
+Because of the flexibility of Unifiedqa, PromptCap also supports multiple-choice VQA
+```
+question = "what piece of clothing is this boy putting on?"
+image = "glove_boy.jpeg"
+choices = ["gloves", "socks", "shoes", "coats"]
+print(vqa_model.vqa_multiple_choice(question, image, choices))
+```