shixuanleong commited on
Commit
b8a5c60
·
verified ·
1 Parent(s): 4d70135

update readme.md

Browse files
Files changed (1) hide show
  1. README.md +9 -49
README.md CHANGED
@@ -15,21 +15,14 @@ VisualHeist is an object detection model finetuned to extract tables and figures
15
  - visualheist-base[[HF]](https://huggingface.co/shixuanleong/visualheist-base) (0.23B)
16
  - visualheist-large[[HF]](https://huggingface.co/shixuanleong/visualheist-large) (0.77B)
17
 
18
- The models are finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints.
19
- VisualHeist is inspired by and adapted from [yifeihu/TF-ID](https://huggingface.co/yifeihu/TF-ID-large)
 
20
 
21
  - The models were finetuned with 3435 figures and 1716 tables from 110 PDF articles across various publishers. All bounding boxes are manually annotated using [CoCo Annotator](https://github.com/jsbroks/coco-annotator).
22
  - TF-ID models take an image of a single paper page as the input, and return image files for all figures, schemes and tables in the given page.
23
 
24
 
25
- **The base model is recommended if you are running it on low-RAM systems**
26
-
27
- ![image/png](https://huggingface.co/yifeihu/TF-ID-base/resolve/main/td-id-caption.png)
28
-
29
- Object Detection results format:
30
- {'\<OD>': {'bboxes': [[x1, y1, x2, y2], ...],
31
- 'labels': ['label1', 'label2', ...]} }
32
-
33
  ## Training Code and Dataset
34
  - Dataset: [Zenodo repository](https://doi.org/10.5281/zenodo.14917752)
35
  - Code: [github.com/aspuru-guzik-group/mermaid](https://github.com/aspuru-guzik-group/mermaid)
@@ -43,7 +36,9 @@ and science education. These PDFs, published between 1949 and 2025, include both
43
  We also additionally curated another collection of 98 literature articles (MERMaid-100) reporting novel reaction methodologies that spans
44
  three distinct chemical domains: organic electrosynthesis, photocatalysis, and organic synthesis.
45
 
46
- The full DOI lists can be downloaded from our [Zenodo repository](https://doi.org/10.5281/zenodo.14917752).
 
 
47
 
48
  The evaluation results for visualheist-large are:
49
  | | Total Images | F1 score |
@@ -55,48 +50,13 @@ The evaluation results for visualheist-large are:
55
  | MERMaid-100 | 100 | 99% |
56
 
57
 
58
- ## How to Get Started with the Model
59
 
60
- Use the code below to get started with the model.
61
-
62
- ```python
63
- import requests
64
- from PIL import Image
65
- from transformers import AutoProcessor, AutoModelForCausalLM
66
-
67
- model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
68
- processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
69
-
70
- prompt = "<OD>"
71
- url = "https://huggingface.co/yifeihu/TF-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
72
- image = Image.open(requests.get(url, stream=True).raw)
73
-
74
- inputs = processor(text=prompt, images=image, return_tensors="pt")
75
- generated_ids = model.generate(
76
- input_ids=inputs["input_ids"],
77
- pixel_values=inputs["pixel_values"],
78
- max_new_tokens=1024,
79
- do_sample=False,
80
- num_beams=3
81
- )
82
-
83
- generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
84
- parsed_answer = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
85
-
86
- print(parsed_answer)
87
- ```
88
 
89
- To visualize the results, see [this tutorial notebook](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb) for more details.
90
 
91
  ## BibTex and citation info
92
 
93
  ```
94
- @misc{TF-ID,
95
- author = {Yifei Hu},
96
- title = {TF-ID: Table/Figure IDentifier for academic papers},
97
- year = {2024},
98
- publisher = {GitHub},
99
- journal = {GitHub repository},
100
- howpublished = {\url{https://github.com/ai8hyf/TF-ID}},
101
- }
102
  ```
 
15
  - visualheist-base[[HF]](https://huggingface.co/shixuanleong/visualheist-base) (0.23B)
16
  - visualheist-large[[HF]](https://huggingface.co/shixuanleong/visualheist-large) (0.77B)
17
 
18
+ **The base model is recommended if you are running it on low-RAM systems**
19
+
20
+ The models are finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints. VisualHeist is inspired by and adapted from [yifeihu/TF-ID](https://huggingface.co/yifeihu/TF-ID-large)
21
 
22
  - The models were finetuned with 3435 figures and 1716 tables from 110 PDF articles across various publishers. All bounding boxes are manually annotated using [CoCo Annotator](https://github.com/jsbroks/coco-annotator).
23
  - TF-ID models take an image of a single paper page as the input, and return image files for all figures, schemes and tables in the given page.
24
 
25
 
 
 
 
 
 
 
 
 
26
  ## Training Code and Dataset
27
  - Dataset: [Zenodo repository](https://doi.org/10.5281/zenodo.14917752)
28
  - Code: [github.com/aspuru-guzik-group/mermaid](https://github.com/aspuru-guzik-group/mermaid)
 
36
  We also additionally curated another collection of 98 literature articles (MERMaid-100) reporting novel reaction methodologies that spans
37
  three distinct chemical domains: organic electrosynthesis, photocatalysis, and organic synthesis.
38
 
39
+ Additional performance discussion can be found from our [preprint article](XXXXXXX)
40
+
41
+ The full DOI lists can be downloaded from our[Zenodo repository](https://doi.org/10.5281/zenodo.14917752).
42
 
43
  The evaluation results for visualheist-large are:
44
  | | Total Images | F1 score |
 
50
  | MERMaid-100 | 100 | 99% |
51
 
52
 
53
+ ## Running the Model
54
 
55
+ Refer to our [github repository](https://github.com/aspuru-guzik-group/mermaid) for detailed instructions on how to run the model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
 
57
 
58
  ## BibTex and citation info
59
 
60
  ```
61
+ <To be updated with our archive citation>
 
 
 
 
 
 
 
62
  ```