numind
/

NuExtract-2-4B-experimental

Text Generation

Model card Files Files and versions

liamcripwell commited on Mar 5

Commit

a9f328c

·

verified ·

1 Parent(s): d82590b

Update README.md

Files changed (1) hide show

README.md +80 -1

README.md CHANGED Viewed

@@ -536,4 +536,83 @@ for y in result:
 # {"store_name": "Trader Joe's"}
 # {"names": ["John", "Mary", "James"]}
 # {"names": ["JOHN", "MARY", "JAMES"], "female_names": ["MARY"]}
-```

 # {"store_name": "Trader Joe's"}
 # {"names": ["John", "Mary", "James"]}
 # {"names": ["JOHN", "MARY", "JAMES"], "female_names": ["MARY"]}
+```
+## Template Generation
+If you want to convert existing schema files you have in other formats (e.g. XML, YAML, etc.) or start from an example, NuExtract 2 models can automatically generate this for you.
+E.g. convert XML into a NuExtract template:
+```python
+def generate_template(description):
+    input_messages = [description]
+    input_content = prepare_inputs(
+        messages=input_messages,
+        image_paths=[],
+        tokenizer=tokenizer,
+    )
+    generation_config = {"do_sample": True, "temperature": 0.4, "max_new_tokens": 256}
+    with torch.no_grad():
+        result = nuextract_generate(
+            model=model,
+            tokenizer=tokenizer,
+            prompts=input_content['prompts'],
+            pixel_values_list=input_content['pixel_values_list'],
+            num_patches_list=input_content['num_patches_list'],
+            generation_config=generation_config
+        )
+    return result[0]
+xml_template = """<SportResult>
+    <Date></Date>
+    <Sport></Sport>
+    <Venue></Venue>
+    <HomeTeam></HomeTeam>
+    <AwayTeam></AwayTeam>
+    <HomeScore></HomeScore>
+    <AwayScore></AwayScore>
+    <TopScorer></TopScorer>
+</SportResult>"""
+result = generate_template(xml_template)
+print(result)
+# {
+#     "SportResult": {
+#         "Date": "date-time",
+#         "Sport": "verbatim-string",
+#         "Venue": "verbatim-string",
+#         "HomeTeam": "verbatim-string",
+#         "AwayTeam": "verbatim-string",
+#         "HomeScore": "integer",
+#         "AwayScore": "integer",
+#         "TopScorer": "verbatim-string"
+#     }
+# }
+```
+E.g. generate a template from natural language description:
+```python
+text = """Give me relevant info about startup companies mentioned."""
+result = generate_template(text)
+print(result)
+# {
+#     "Startup_Companies": [
+#         {
+#             "Name": "verbatim-string",
+#             "Products": [
+#                 "string"
+#             ],
+#             "Location": "verbatim-string",
+#             "Company_Type": [
+#                 "Technology",
+#                 "Finance",
+#                 "Health",
+#                 "Education",
+#                 "Other"
+#             ]
+#         }
+#     ]
+# }
+```