SorenNumind commited on
Commit
0707869
·
verified ·
1 Parent(s): 1191b09

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +588 -0
README.md ADDED
@@ -0,0 +1,588 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: mit
4
+ base_model:
5
+ - Qwen/Qwen2.5-VL-8B-Instruct
6
+ pipeline_tag: image-text-to-text
7
+ ---
8
+
9
+ <p align="center">
10
+ <a href="https://nuextract.ai/">
11
+ <img src="logo_nuextract.svg" width="200"/>
12
+ </a>
13
+ </p>
14
+ <p align="center">
15
+ 🖥️ <a href="https://nuextract.ai/">API / Platform</a>&nbsp&nbsp | &nbsp&nbsp📑 <a href="https://numind.ai/blog">Blog</a>&nbsp&nbsp | &nbsp&nbsp🗣️ <a href="https://discord.gg/3tsEtJNCDe">Discord</a>
16
+ </p>
17
+
18
+ # NuExtract 2.0 8B by NuMind 🔥
19
+
20
+ NuExtract 2.0 is a family of models trained specifically for structured information extraction tasks. It supports both multimodal inputs and is multilingual.
21
+
22
+ We provide several versions of different sizes, all based on pre-trained models from the QwenVL family.
23
+ | Model Size | Model Name | Base Model | License | Huggingface Link |
24
+ |------------|------------|------------|---------|------------------|
25
+ | 2B | NuExtract-2.0-2B | [Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) | MIT | 🤗 [NuExtract-2.0-2B](https://huggingface.co/numind/NuExtract-2.0-2B) |
26
+ | 4B | NuExtract-2.0-4B | [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | Qwen Research License | 🤗 [NuExtract-2.0-4B](https://huggingface.co/numind/NuExtract-2.0-4B) |
27
+ | 8B | NuExtract-2.0-8B | [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | MIT | 🤗 [NuExtract-2.0-8B](https://huggingface.co/numind/NuExtract-2.0-8B) |
28
+
29
+ ❗️Note: `NuExtract-2.0-2B` is based on Qwen2-VL rather than Qwen2.5-VL because the smallest Qwen2.5-VL model (3B) has a more restrictive, non-commercial license. We therefore include `NuExtract-2.0-2B` as a small model option that can be used commercially.
30
+
31
+ ## Benchmark
32
+ Performance on collection of ~1,000 diverse extraction examples containing both text and image inputs.
33
+ <a href="https://nuextract.ai/">
34
+ <img src="nuextract2_bench.png" width="500"/>
35
+ </a>
36
+
37
+ ## Overview
38
+
39
+ To use the model, provide an input text/image and a JSON template describing the information you need to extract. The template should be a JSON object, specifying field names and their expected type.
40
+
41
+ Support types include:
42
+ * `verbatim-string` - instructs the model to extract text that is present verbatim in the input.
43
+ * `string` - a generic string field that can incorporate paraphrasing/abstraction.
44
+ * `integer` - a whole number.
45
+ * `number` - a whole or decimal number.
46
+ * `date-time` - ISO formatted date.
47
+ * Array of any of the above types (e.g. `["string"]`)
48
+ * `enum` - a choice from set of possible answers (represented in template as an array of options, e.g. `["yes", "no", "maybe"]`).
49
+ * `multi-label` - an enum that can have multiple possible answers (represented in template as a double-wrapped array, e.g. `[["A", "B", "C"]]`).
50
+
51
+ If the model does not identify relevant information for a field, it will return `null` or `[]` (for arrays and multi-labels).
52
+
53
+ The following is an example template:
54
+ ```json
55
+ {
56
+ "first_name": "verbatim-string",
57
+ "last_name": "verbatim-string",
58
+ "description": "string",
59
+ "age": "integer",
60
+ "gpa": "number",
61
+ "birth_date": "date-time",
62
+ "nationality": ["France", "England", "Japan", "USA", "China"],
63
+ "languages_spoken": [["English", "French", "Japanese", "Mandarin", "Spanish"]]
64
+ }
65
+ ```
66
+ An example output:
67
+ ```json
68
+ {
69
+ "first_name": "Susan",
70
+ "last_name": "Smith",
71
+ "description": "A student studying computer science.",
72
+ "age": 20,
73
+ "gpa": 3.7,
74
+ "birth_date": "2005-03-01",
75
+ "nationality": "England",
76
+ "languages_spoken": ["English", "French"]
77
+ }
78
+ ```
79
+
80
+ ⚠️ We recommend using NuExtract with a temperature at or very close to 0. Some inference frameworks, such as Ollama, use a default of 0.7 which is not well suited to many extraction tasks.
81
+
82
+ ## Using NuExtract with 🤗 Transformers
83
+
84
+ ```python
85
+ import torch
86
+ from transformers import AutoProcessor
87
+ from gptqmodel import GPTQModel
88
+
89
+ model_name = "numind/NuExtract-2.0-8B-GPTQ"
90
+ # model_name = "numind/NuExtract-2.0-4B-GPTQ"
91
+
92
+ model = GPTQModel.load(model_name)
93
+ processor = AutoProcessor.from_pretrained(model_name,
94
+ trust_remote_code=True,
95
+ padding_side='left',
96
+ use_fast=True)
97
+
98
+ # You can set min_pixels and max_pixels according to your needs, such as a token range of 256-1280, to balance performance and cost.
99
+ # min_pixels = 256*28*28
100
+ # max_pixels = 1280*28*28
101
+ # processor = AutoProcessor.from_pretrained(model_name, min_pixels=min_pixels, max_pixels=max_pixels)
102
+ ```
103
+
104
+ You will need the following function to handle loading of image input data:
105
+ ```python
106
+ def process_all_vision_info(messages, examples=None):
107
+ """
108
+ Process vision information from both messages and in-context examples, supporting batch processing.
109
+
110
+ Args:
111
+ messages: List of message dictionaries (single input) OR list of message lists (batch input)
112
+ examples: Optional list of example dictionaries (single input) OR list of example lists (batch)
113
+
114
+ Returns:
115
+ A flat list of all images in the correct order:
116
+ - For single input: example images followed by message images
117
+ - For batch input: interleaved as (item1 examples, item1 input, item2 examples, item2 input, etc.)
118
+ - Returns None if no images were found
119
+ """
120
+ from qwen_vl_utils import process_vision_info, fetch_image
121
+
122
+ # Helper function to extract images from examples
123
+ def extract_example_images(example_item):
124
+ if not example_item:
125
+ return []
126
+
127
+ # Handle both list of examples and single example
128
+ examples_to_process = example_item if isinstance(example_item, list) else [example_item]
129
+ images = []
130
+
131
+ for example in examples_to_process:
132
+ if isinstance(example.get('input'), dict) and example['input'].get('type') == 'image':
133
+ images.append(fetch_image(example['input']))
134
+
135
+ return images
136
+
137
+ # Normalize inputs to always be batched format
138
+ is_batch = messages and isinstance(messages[0], list)
139
+ messages_batch = messages if is_batch else [messages]
140
+ is_batch_examples = examples and isinstance(examples, list) and (isinstance(examples[0], list) or examples[0] is None)
141
+ examples_batch = examples if is_batch_examples else ([examples] if examples is not None else None)
142
+
143
+ # Ensure examples batch matches messages batch if provided
144
+ if examples and len(examples_batch) != len(messages_batch):
145
+ if not is_batch and len(examples_batch) == 1:
146
+ # Single example set for a single input is fine
147
+ pass
148
+ else:
149
+ raise ValueError("Examples batch length must match messages batch length")
150
+
151
+ # Process all inputs, maintaining correct order
152
+ all_images = []
153
+ for i, message_group in enumerate(messages_batch):
154
+ # Get example images for this input
155
+ if examples and i < len(examples_batch):
156
+ input_example_images = extract_example_images(examples_batch[i])
157
+ all_images.extend(input_example_images)
158
+
159
+ # Get message images for this input
160
+ input_message_images = process_vision_info(message_group)[0] or []
161
+ all_images.extend(input_message_images)
162
+
163
+ return all_images if all_images else None
164
+ ```
165
+
166
+ E.g. To perform a basic extraction of names from a text document:
167
+ ```python
168
+ template = """{"names": ["string"]}"""
169
+ document = "John went to the restaurant with Mary. James went to the cinema."
170
+
171
+ # prepare the user message content
172
+ messages = [{"role": "user", "content": document}]
173
+ text = processor.tokenizer.apply_chat_template(
174
+ messages,
175
+ template=template, # template is specified here
176
+ tokenize=False,
177
+ add_generation_prompt=True,
178
+ )
179
+
180
+ print(text)
181
+ """"<|im_start|>user
182
+ # Template:
183
+ {"names": ["string"]}
184
+ # Context:
185
+ John went to the restaurant with Mary. James went to the cinema.<|im_end|>
186
+ <|im_start|>assistant"""
187
+
188
+ image_inputs = process_all_vision_info(messages)
189
+ inputs = processor(
190
+ text=[text],
191
+ images=image_inputs,
192
+ padding=True,
193
+ return_tensors="pt",
194
+ ).to("cuda")
195
+
196
+ # we choose greedy sampling here, which works well for most information extraction tasks
197
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
198
+
199
+ # Inference: Generation of the output
200
+ generated_ids = model.generate(
201
+ **inputs,
202
+ **generation_config
203
+ )
204
+ generated_ids_trimmed = [
205
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
206
+ ]
207
+ output_text = processor.batch_decode(
208
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
209
+ )
210
+
211
+ print(output_text)
212
+ # ['{"names": ["John", "Mary", "James"]}']
213
+ ```
214
+
215
+ <details>
216
+ <summary>In-Context Examples</summary>
217
+ Sometimes the model might not perform as well as we want because our task is challenging or involves some degree of ambiguity. Alternatively, we may want the model to follow some specific formatting, or just give it a bit more help. In cases like this it can be valuable to provide "in-context examples" to help NuExtract better understand the task.
218
+
219
+ To do so, we can provide a list examples (dictionaries of input/output pairs). In the example below, we show to the model that we want the extracted names to be in captial letters with `-` on either side (for the sake of illustration). Usually providing multiple examples will lead to better results.
220
+ ```python
221
+ template = """{"names": ["string"]}"""
222
+ document = "John went to the restaurant with Mary. James went to the cinema."
223
+ examples = [
224
+ {
225
+ "input": "Stephen is the manager at Susan's store.",
226
+ "output": """{"names": ["-STEPHEN-", "-SUSAN-"]}"""
227
+ }
228
+ ]
229
+
230
+ messages = [{"role": "user", "content": document}]
231
+ text = processor.tokenizer.apply_chat_template(
232
+ messages,
233
+ template=template,
234
+ examples=examples, # examples provided here
235
+ tokenize=False,
236
+ add_generation_prompt=True,
237
+ )
238
+
239
+ image_inputs = process_all_vision_info(messages, examples)
240
+ inputs = processor(
241
+ text=[text],
242
+ images=image_inputs,
243
+ padding=True,
244
+ return_tensors="pt",
245
+ ).to("cuda")
246
+
247
+ # we choose greedy sampling here, which works well for most information extraction tasks
248
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
249
+
250
+ # Inference: Generation of the output
251
+ generated_ids = model.generate(
252
+ **inputs,
253
+ **generation_config
254
+ )
255
+ generated_ids_trimmed = [
256
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
257
+ ]
258
+ output_text = processor.batch_decode(
259
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
260
+ )
261
+ print(output_text)
262
+ # ['{"names": ["-JOHN-", "-MARY-", "-JAMES-"]}']
263
+ ```
264
+ </details>
265
+
266
+ <details>
267
+ <summary>Image Inputs</summary>
268
+ If we want to give image inputs to NuExtract, instead of text, we simply provide a dictionary specifying the desired image file as the message content, instead of a string. (e.g. `{"type": "image", "image": "file://image.jpg"}`).
269
+
270
+ You can also specify an image URL (e.g. `{"type": "image", "image": "http://path/to/your/image.jpg"}`) or base64 encoding (e.g. `{"type": "image", "image": "data:image;base64,/9j/..."}`).
271
+ ```python
272
+ template = """{"store": "verbatim-string"}"""
273
+ document = {"type": "image", "image": "file://1.jpg"}
274
+
275
+ messages = [{"role": "user", "content": [document]}]
276
+ text = processor.tokenizer.apply_chat_template(
277
+ messages,
278
+ template=template,
279
+ tokenize=False,
280
+ add_generation_prompt=True,
281
+ )
282
+
283
+ image_inputs = process_all_vision_info(messages)
284
+ inputs = processor(
285
+ text=[text],
286
+ images=image_inputs,
287
+ padding=True,
288
+ return_tensors="pt",
289
+ ).to("cuda")
290
+
291
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
292
+
293
+ # Inference: Generation of the output
294
+ generated_ids = model.generate(
295
+ **inputs,
296
+ **generation_config
297
+ )
298
+ generated_ids_trimmed = [
299
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
300
+ ]
301
+ output_text = processor.batch_decode(
302
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
303
+ )
304
+ print(output_text)
305
+ # ['{"store": "Trader Joe\'s"}']
306
+ ```
307
+ </details>
308
+
309
+ <details>
310
+ <summary>Batch Inference</summary>
311
+
312
+ ```python
313
+ inputs = [
314
+ # image input with no ICL examples
315
+ {
316
+ "document": {"type": "image", "image": "file://0.jpg"},
317
+ "template": """{"store_name": "verbatim-string"}""",
318
+ },
319
+ # image input with 1 ICL example
320
+ {
321
+ "document": {"type": "image", "image": "file://0.jpg"},
322
+ "template": """{"store_name": "verbatim-string"}""",
323
+ "examples": [
324
+ {
325
+ "input": {"type": "image", "image": "file://1.jpg"},
326
+ "output": """{"store_name": "Trader Joe's"}""",
327
+ }
328
+ ],
329
+ },
330
+ # text input with no ICL examples
331
+ {
332
+ "document": {"type": "text", "text": "John went to the restaurant with Mary. James went to the cinema."},
333
+ "template": """{"names": ["string"]}""",
334
+ },
335
+ # text input with ICL example
336
+ {
337
+ "document": {"type": "text", "text": "John went to the restaurant with Mary. James went to the cinema."},
338
+ "template": """{"names": ["string"]}""",
339
+ "examples": [
340
+ {
341
+ "input": "Stephen is the manager at Susan's store.",
342
+ "output": """{"names": ["STEPHEN", "SUSAN"]}"""
343
+ }
344
+ ],
345
+ },
346
+ ]
347
+
348
+ # messages should be a list of lists for batch processing
349
+ messages = [
350
+ [
351
+ {
352
+ "role": "user",
353
+ "content": [x['document']],
354
+ }
355
+ ]
356
+ for x in inputs
357
+ ]
358
+
359
+ # apply chat template to each example individually
360
+ texts = [
361
+ processor.tokenizer.apply_chat_template(
362
+ messages[i], # Now this is a list containing one message
363
+ template=x['template'],
364
+ examples=x.get('examples', None),
365
+ tokenize=False,
366
+ add_generation_prompt=True)
367
+ for i, x in enumerate(inputs)
368
+ ]
369
+
370
+ image_inputs = process_all_vision_info(messages, [x.get('examples') for x in inputs])
371
+ inputs = processor(
372
+ text=texts,
373
+ images=image_inputs,
374
+ padding=True,
375
+ return_tensors="pt",
376
+ ).to("cuda")
377
+
378
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
379
+
380
+ # Batch Inference
381
+ generated_ids = model.generate(**inputs, **generation_config)
382
+ generated_ids_trimmed = [
383
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
384
+ ]
385
+ output_texts = processor.batch_decode(
386
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
387
+ )
388
+ for y in output_texts:
389
+ print(y)
390
+ # {"store_name": "WAL-MART"}
391
+ # {"store_name": "Walmart"}
392
+ # {"names": ["John", "Mary", "James"]}
393
+ # {"names": ["JOHN", "MARY", "JAMES"]}
394
+ ```
395
+ </details>
396
+
397
+ <details>
398
+ <summary>Template Generation</summary>
399
+ If you want to convert existing schema files you have in other formats (e.g. XML, YAML, etc.) or start from an example, NuExtract 2.0 models can automatically generate this for you.
400
+
401
+ E.g. convert XML into a NuExtract template:
402
+ ```python
403
+ xml_template = """<SportResult>
404
+ <Date></Date>
405
+ <Sport></Sport>
406
+ <Venue></Venue>
407
+ <HomeTeam></HomeTeam>
408
+ <AwayTeam></AwayTeam>
409
+ <HomeScore></HomeScore>
410
+ <AwayScore></AwayScore>
411
+ <TopScorer></TopScorer>
412
+ </SportResult>"""
413
+
414
+ messages = [
415
+ {
416
+ "role": "user",
417
+ "content": [{"type": "text", "text": xml_template}],
418
+ }
419
+ ]
420
+
421
+ text = processor.apply_chat_template(
422
+ messages, tokenize=False, add_generation_prompt=True,
423
+ )
424
+
425
+ image_inputs = process_all_vision_info(messages)
426
+ inputs = processor(
427
+ text=[text],
428
+ images=image_inputs,
429
+ padding=True,
430
+ return_tensors="pt",
431
+ ).to("cuda")
432
+
433
+ generated_ids = model.generate(
434
+ **inputs,
435
+ **generation_config
436
+ )
437
+ generated_ids_trimmed = [
438
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
439
+ ]
440
+ output_text = processor.batch_decode(
441
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
442
+ )
443
+
444
+ print(output_text[0])
445
+ # {
446
+ # "Date": "date-time",
447
+ # "Sport": "verbatim-string",
448
+ # "Venue": "verbatim-string",
449
+ # "HomeTeam": "verbatim-string",
450
+ # "AwayTeam": "verbatim-string",
451
+ # "HomeScore": "integer",
452
+ # "AwayScore": "integer",
453
+ # "TopScorer": "verbatim-string"
454
+ # }
455
+ ```
456
+
457
+ E.g. generate a template from natural language description:
458
+ ```python
459
+ description = "I would like to extract important details from the contract."
460
+
461
+ messages = [
462
+ {
463
+ "role": "user",
464
+ "content": [{"type": "text", "text": description}],
465
+ }
466
+ ]
467
+
468
+ text = processor.apply_chat_template(
469
+ messages, tokenize=False, add_generation_prompt=True,
470
+ )
471
+
472
+ image_inputs = process_all_vision_info(messages)
473
+ inputs = processor(
474
+ text=[text],
475
+ images=image_inputs,
476
+ padding=True,
477
+ return_tensors="pt",
478
+ ).to("cuda")
479
+
480
+ generated_ids = model.generate(
481
+ **inputs,
482
+ **generation_config
483
+ )
484
+ generated_ids_trimmed = [
485
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
486
+ ]
487
+ output_text = processor.batch_decode(
488
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
489
+ )
490
+
491
+ print(output_text[0])
492
+ # {
493
+ # "Contract": {
494
+ # "Title": "verbatim-string",
495
+ # "Description": "verbatim-string",
496
+ # "Terms": [
497
+ # {
498
+ # "Term": "verbatim-string",
499
+ # "Description": "verbatim-string"
500
+ # }
501
+ # ],
502
+ # "Date": "date-time",
503
+ # "Signatory": "verbatim-string"
504
+ # }
505
+ # }
506
+ ```
507
+ </details>
508
+
509
+ ## Fine-Tuning
510
+ You can find a fine-tuning tutorial notebook in the [cookbooks](https://github.com/numindai/nuextract/tree/main/cookbooks) folder of the [GitHub repo](https://github.com/numindai/nuextract/tree/main).
511
+
512
+ ## vLLM Deployment
513
+ Run the command below to serve an OpenAI-compatible API:
514
+ ```bash
515
+ vllm serve numind/NuExtract-2.0-8B --trust_remote_code --limit-mm-per-prompt image=6 --chat-template-content-format openai
516
+ ```
517
+ If you encounter memory issues, set `--max-model-len` accordingly.
518
+
519
+ Send requests to the model as follows:
520
+ ```python
521
+ import json
522
+ from openai import OpenAI
523
+
524
+ openai_api_key = "EMPTY"
525
+ openai_api_base = "http://localhost:8000/v1"
526
+
527
+ client = OpenAI(
528
+ api_key=openai_api_key,
529
+ base_url=openai_api_base,
530
+ )
531
+
532
+ chat_response = client.chat.completions.create(
533
+ model="numind/NuExtract-2.0-8B",
534
+ temperature=0,
535
+ messages=[
536
+ {
537
+ "role": "user",
538
+ "content": [{"type": "text", "text": "Yesterday I went shopping at Bunnings"}],
539
+ },
540
+ ],
541
+ extra_body={
542
+ "chat_template_kwargs": {
543
+ "template": json.dumps(json.loads("""{\"store\": \"verbatim-string\"}"""), indent=4)
544
+ },
545
+ }
546
+ )
547
+ print("Chat response:", chat_response)
548
+ ```
549
+ For image inputs, structure requests as shown below. Make sure to order the images in `"content"` as they appear in the prompt (i.e. any in-context examples before the main input).
550
+ ```python
551
+ import base64
552
+
553
+ def encode_image(image_path):
554
+ """
555
+ Encode the image file to base64 string
556
+ """
557
+ with open(image_path, "rb") as image_file:
558
+ return base64.b64encode(image_file.read()).decode('utf-8')
559
+
560
+ base64_image = encode_image("0.jpg")
561
+ base64_image2 = encode_image("1.jpg")
562
+
563
+ chat_response = client.chat.completions.create(
564
+ model="numind/NuExtract-2.0-8B",
565
+ temperature=0,
566
+ messages=[
567
+ {
568
+ "role": "user",
569
+ "content": [
570
+ {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}, # first ICL example image
571
+ {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image2}"}}, # real input image
572
+ ],
573
+ },
574
+ ],
575
+ extra_body={
576
+ "chat_template_kwargs": {
577
+ "template": json.dumps(json.loads("""{\"store\": \"verbatim-string\"}"""), indent=4),
578
+ "examples": [
579
+ {
580
+ "input": "<image>",
581
+ "output": """{\"store\": \"Walmart\"}"""
582
+ }
583
+ ]
584
+ },
585
+ }
586
+ )
587
+ print("Chat response:", chat_response)
588
+ ```