tifa-benchmark
/

promptcap-coco-vqa

visual-question-answering

image-captioning

Model card Files Files and versions

yushihu commited on Jan 25, 2023

Commit

5eb3f77

·

1 Parent(s): 7cef707

Update README.md

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -26,13 +26,13 @@ pip install promptcap
 ## Captioning Pipeline
-Generate a prompt-guided caption by following:
 ```python
 import torch
 from promptcap import PromptCap
-model = PromptCap("vqascore/promptcap-coco-vqa")  # also support OFA checkpoints. e.g. "OFA-Sys/ofa-base"
 if torch.cuda.is_available():
   model.cuda()
@@ -47,7 +47,7 @@ To try generic captioning, just use "please describe this image according to the
 PromptCap also support taking OCR inputs:
-```
 prompt = "please describe this image according to the given question: what year was this taken?"
 image = "dvds.jpg"
 ocr = "yip AE Mht juor 02/14/2012"
@@ -62,7 +62,7 @@ print(model.caption(prompt, image, ocr))
 Different from typical VQA models, which are doing classification on VQAv2, PromptCap is open-domain and can be paired with arbitrary text-QA models.
 Here we provide a pipeline for combining PromptCap with UnifiedQA.
-```
 import torch
 from promptcap import PromptCap_VQA
@@ -80,7 +80,7 @@ print(vqa_model.vqa(question, image))
 Similarly, PromptCap supports OCR inputs
-```
 question = "what year was this taken?"
 image = "dvds.jpg"
 ocr = "yip AE Mht juor 02/14/2012"
@@ -90,7 +90,7 @@ print(vqa_model.vqa(prompt, image, ocr=ocr))
 Because of the flexibility of Unifiedqa, PromptCap also supports multiple-choice VQA
-```
 question = "what piece of clothing is this boy putting on?"
 image = "glove_boy.jpeg"
 choices = ["gloves", "socks", "shoes", "coats"]

 ## Captioning Pipeline
+Please follow the prompt format, which will give the best performance.
+Generate a prompt-guided caption by following
 ```python
 import torch
 from promptcap import PromptCap
+model = PromptCap("vqascore/promptcap-coco-vqa")  # also support OFA checkpoints. e.g. "OFA-Sys/ofa-large"
 if torch.cuda.is_available():
   model.cuda()
 PromptCap also support taking OCR inputs:
+```python
 prompt = "please describe this image according to the given question: what year was this taken?"
 image = "dvds.jpg"
 ocr = "yip AE Mht juor 02/14/2012"
 Different from typical VQA models, which are doing classification on VQAv2, PromptCap is open-domain and can be paired with arbitrary text-QA models.
 Here we provide a pipeline for combining PromptCap with UnifiedQA.
+```python
 import torch
 from promptcap import PromptCap_VQA
 Similarly, PromptCap supports OCR inputs
+```python
 question = "what year was this taken?"
 image = "dvds.jpg"
 ocr = "yip AE Mht juor 02/14/2012"
 Because of the flexibility of Unifiedqa, PromptCap also supports multiple-choice VQA
+```python
 question = "what piece of clothing is this boy putting on?"
 image = "glove_boy.jpeg"
 choices = ["gloves", "socks", "shoes", "coats"]