Isotonic commited on
Commit
d8aafa8
·
verified ·
1 Parent(s): 27e1502

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -3
README.md CHANGED
@@ -40,9 +40,85 @@ This is the model card of a 🤗 transformers model that has been pushed on the
40
 
41
  ### Direct Use
42
 
43
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
44
-
45
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ### Downstream Use [optional]
48
 
 
40
 
41
  ### Direct Use
42
 
43
+ ```
44
+ TEXT = """
45
+
46
+ """
47
+
48
+ SCHEMA = """
49
+
50
+ """
51
+
52
+ SYSTEM_PROMPT = """
53
+ ### Role:
54
+ You are an expert data extractor specializing in mapping hierarchical text data into a given JSON Schema.
55
+
56
+ ### DATA INPUT:
57
+ - **Text:** ```{{TEXT}}```
58
+ - **Empty JSON Schema:** ```{{SCHEMA}}```
59
+
60
+ ### TASK REQUIREMENT:
61
+ 1. Analyze the given text and map all relevant information strictly into the provided JSON Schema.
62
+ 2. Provide your output in **two mandatory sections**:
63
+ - **`<answer>`:** The filled JSON object
64
+ - **`<think>`:** Reasoning for the mapping decisions
65
+
66
+ ### OUTPUT STRUCTURE:
67
+
68
+ `<think> /* Explanation of mapping logic */ </think>`
69
+ `<answer> /* Completed JSON Object */ </answer>`
70
+
71
+ ### STRICT RULES FOR GENERATING OUTPUT:
72
+ 1. **Both Tags Required:**
73
+ - Always provide both the `<think>` and the `<answer>` sections.
74
+ - If reasoning is minimal, state: "Direct mapping from text to schema."
75
+ 2. **JSON Schema Mapping:**
76
+ - Strictly map the text data to the given JSON Schema without modification or omissions.
77
+ 3. **Hierarchy Preservation:**
78
+ - Maintain proper parent-child relationships and follow the schema's hierarchical structure.
79
+ 4. **Correct Mapping of Attributes:**
80
+ -Map key attributes, including `displayName`, `description`, `type`, `component`, and source to define the structure, metadata, and data sources for each field within the schema
81
+ 5. **JSON Format Compliance:**
82
+ - Escape quotes (`\"`), replace newlines with `\\n`, avoid trailing commas, and use double quotes exclusively.
83
+ 6. **Step-by-Step Reasoning:**
84
+ - Explain your reasoning within the `<think>` tag.
85
+
86
+ ### IMPORTANT:
87
+ If either the `<think>` or `<answer>` tags is missing, the response will be considered incomplete.
88
+ """
89
+ from jinja2 import Template
90
+ system_prompt_template = Template(SYSTEM_PROMPT)
91
+ system_prompt_str = system_prompt_template.render(TEXT=TEXT, SCHEMA=SCHEMA)
92
+ ```
93
+
94
+ ```
95
+ from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, FineGrainedFP8Config
96
+ import torch
97
+
98
+ model_name = "Isotonic/DR1-1.5b-JSON_extraction"
99
+
100
+ # Initialize tokenizer and model
101
+ device = "mps"
102
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
103
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map=device)
104
+
105
+ inputs = tokenizer([system_prompt_str], return_tensors="pt").to(device)
106
+ text_streamer = TextStreamer(tokenizer)
107
+
108
+ with torch.no_grad():
109
+ output_ids = model.generate(
110
+ input_ids=inputs["input_ids"],
111
+ attention_mask=inputs["attention_mask"],
112
+ max_new_tokens=4096,
113
+ temperature=0.6,
114
+ top_p=0.92,
115
+ repetition_penalty=1.1,
116
+ streamer=text_streamer,
117
+ pad_token_id=tokenizer.pad_token_id,
118
+ )
119
+
120
+ print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
121
+ ```
122
 
123
  ### Downstream Use [optional]
124