update readme.md
Browse files
README.md
CHANGED
@@ -15,21 +15,14 @@ VisualHeist is an object detection model finetuned to extract tables and figures
|
|
15 |
- visualheist-base[[HF]](https://huggingface.co/shixuanleong/visualheist-base) (0.23B)
|
16 |
- visualheist-large[[HF]](https://huggingface.co/shixuanleong/visualheist-large) (0.77B)
|
17 |
|
18 |
-
The
|
19 |
-
|
|
|
20 |
|
21 |
- The models were finetuned with 3435 figures and 1716 tables from 110 PDF articles across various publishers. All bounding boxes are manually annotated using [CoCo Annotator](https://github.com/jsbroks/coco-annotator).
|
22 |
- TF-ID models take an image of a single paper page as the input, and return image files for all figures, schemes and tables in the given page.
|
23 |
|
24 |
|
25 |
-
**The base model is recommended if you are running it on low-RAM systems**
|
26 |
-
|
27 |
-

|
28 |
-
|
29 |
-
Object Detection results format:
|
30 |
-
{'\<OD>': {'bboxes': [[x1, y1, x2, y2], ...],
|
31 |
-
'labels': ['label1', 'label2', ...]} }
|
32 |
-
|
33 |
## Training Code and Dataset
|
34 |
- Dataset: [Zenodo repository](https://doi.org/10.5281/zenodo.14917752)
|
35 |
- Code: [github.com/aspuru-guzik-group/mermaid](https://github.com/aspuru-guzik-group/mermaid)
|
@@ -43,7 +36,9 @@ and science education. These PDFs, published between 1949 and 2025, include both
|
|
43 |
We also additionally curated another collection of 98 literature articles (MERMaid-100) reporting novel reaction methodologies that spans
|
44 |
three distinct chemical domains: organic electrosynthesis, photocatalysis, and organic synthesis.
|
45 |
|
46 |
-
|
|
|
|
|
47 |
|
48 |
The evaluation results for visualheist-large are:
|
49 |
| | Total Images | F1 score |
|
@@ -55,48 +50,13 @@ The evaluation results for visualheist-large are:
|
|
55 |
| MERMaid-100 | 100 | 99% |
|
56 |
|
57 |
|
58 |
-
##
|
59 |
|
60 |
-
|
61 |
-
|
62 |
-
```python
|
63 |
-
import requests
|
64 |
-
from PIL import Image
|
65 |
-
from transformers import AutoProcessor, AutoModelForCausalLM
|
66 |
-
|
67 |
-
model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
|
68 |
-
processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
|
69 |
-
|
70 |
-
prompt = "<OD>"
|
71 |
-
url = "https://huggingface.co/yifeihu/TF-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
|
72 |
-
image = Image.open(requests.get(url, stream=True).raw)
|
73 |
-
|
74 |
-
inputs = processor(text=prompt, images=image, return_tensors="pt")
|
75 |
-
generated_ids = model.generate(
|
76 |
-
input_ids=inputs["input_ids"],
|
77 |
-
pixel_values=inputs["pixel_values"],
|
78 |
-
max_new_tokens=1024,
|
79 |
-
do_sample=False,
|
80 |
-
num_beams=3
|
81 |
-
)
|
82 |
-
|
83 |
-
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
|
84 |
-
parsed_answer = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
|
85 |
-
|
86 |
-
print(parsed_answer)
|
87 |
-
```
|
88 |
|
89 |
-
To visualize the results, see [this tutorial notebook](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb) for more details.
|
90 |
|
91 |
## BibTex and citation info
|
92 |
|
93 |
```
|
94 |
-
|
95 |
-
author = {Yifei Hu},
|
96 |
-
title = {TF-ID: Table/Figure IDentifier for academic papers},
|
97 |
-
year = {2024},
|
98 |
-
publisher = {GitHub},
|
99 |
-
journal = {GitHub repository},
|
100 |
-
howpublished = {\url{https://github.com/ai8hyf/TF-ID}},
|
101 |
-
}
|
102 |
```
|
|
|
15 |
- visualheist-base[[HF]](https://huggingface.co/shixuanleong/visualheist-base) (0.23B)
|
16 |
- visualheist-large[[HF]](https://huggingface.co/shixuanleong/visualheist-large) (0.77B)
|
17 |
|
18 |
+
**The base model is recommended if you are running it on low-RAM systems**
|
19 |
+
|
20 |
+
The models are finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints. VisualHeist is inspired by and adapted from [yifeihu/TF-ID](https://huggingface.co/yifeihu/TF-ID-large)
|
21 |
|
22 |
- The models were finetuned with 3435 figures and 1716 tables from 110 PDF articles across various publishers. All bounding boxes are manually annotated using [CoCo Annotator](https://github.com/jsbroks/coco-annotator).
|
23 |
- TF-ID models take an image of a single paper page as the input, and return image files for all figures, schemes and tables in the given page.
|
24 |
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
## Training Code and Dataset
|
27 |
- Dataset: [Zenodo repository](https://doi.org/10.5281/zenodo.14917752)
|
28 |
- Code: [github.com/aspuru-guzik-group/mermaid](https://github.com/aspuru-guzik-group/mermaid)
|
|
|
36 |
We also additionally curated another collection of 98 literature articles (MERMaid-100) reporting novel reaction methodologies that spans
|
37 |
three distinct chemical domains: organic electrosynthesis, photocatalysis, and organic synthesis.
|
38 |
|
39 |
+
Additional performance discussion can be found from our [preprint article](XXXXXXX)
|
40 |
+
|
41 |
+
The full DOI lists can be downloaded from our[Zenodo repository](https://doi.org/10.5281/zenodo.14917752).
|
42 |
|
43 |
The evaluation results for visualheist-large are:
|
44 |
| | Total Images | F1 score |
|
|
|
50 |
| MERMaid-100 | 100 | 99% |
|
51 |
|
52 |
|
53 |
+
## Running the Model
|
54 |
|
55 |
+
Refer to our [github repository](https://github.com/aspuru-guzik-group/mermaid) for detailed instructions on how to run the model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
|
|
57 |
|
58 |
## BibTex and citation info
|
59 |
|
60 |
```
|
61 |
+
<To be updated with our archive citation>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
```
|