zeeshaan-ai commited on
Commit
831ecbd
·
verified ·
1 Parent(s): 9b9b617

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +187 -1
README.md CHANGED
@@ -13,4 +13,190 @@ tags:
13
  - medical
14
  - summary
15
  - endocronology
16
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - medical
14
  - summary
15
  - endocronology
16
+ ---
17
+
18
+ # Llama3.2-Medical-Notes-1B-ONNX
19
+
20
+ This is the ONNX quantized version of the [Llama3.2-Medical-Notes-1B](https://huggingface.co/GetSoloTech/Llama3.2-Medical-Notes-1B) model, optimized for efficient inference and deployment.
21
+
22
+ ## Model Details
23
+
24
+ - **Base Model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
25
+ - **Fine-tuning Method:** PEFT (Parameter-Efficient Fine-Tuning) using LoRA
26
+ - **Training Framework:** Unsloth library for accelerated fine-tuning and merging
27
+ - **Quantization:** ONNX format for optimized inference
28
+ - **Task:** Text Generation (specifically, generating structured SOAP notes)
29
+
30
+ ## Paper
31
+
32
+ - [arXiv: 2507.03033](https://arxiv.org/abs/2507.03033)
33
+ - [medRxiv: 10.1101/2025.07.01.25330679v1](https://www.medrxiv.org/content/10.1101/2025.07.01.25330679v1)
34
+
35
+ ## Intended Use
36
+
37
+ **Input:** Free-text medical transcripts (doctor-patient conversations or dictated notes).
38
+
39
+ **Output:** Structured medical notes with clearly defined sections (Demographics, Presenting Illness, History, etc.).
40
+
41
+ ## Usage with ONNX Runtime
42
+
43
+ ```python
44
+ import onnxruntime as ort
45
+ from transformers import AutoTokenizer
46
+ import numpy as np
47
+
48
+ # Load the ONNX model
49
+ model_name = "GetSoloTech/Llama3.2-Medical-Notes-1B-ONNX"
50
+ tokenizer = AutoTokenizer.from_pretrained("GetSoloTech/Llama3.2-Medical-Notes-1B")
51
+
52
+ # Initialize ONNX Runtime session
53
+ session = ort.InferenceSession(onnx_file_path)
54
+
55
+ SYSTEM_PROMPT = """Convert the following medical transcript to a structured medical note.
56
+
57
+ Use these sections in this order:
58
+
59
+ 1. Demographics
60
+ - Name, Age, Sex, DOB
61
+
62
+ 2. Presenting Illness
63
+ - Bullet point statements of the main problem and duration.
64
+
65
+ 3. History of Presenting Illness
66
+ - Chronological narrative: symptom onset, progression, modifiers, associated factors.
67
+
68
+ 4. Past Medical History
69
+ - List chronic illnesses and past medical diagnoses mentioned in the transcript. Do not include surgeries.
70
+
71
+ 5. Surgical History
72
+ - List prior surgeries with year if known, as mentioned in the transcript.
73
+
74
+ 6. Family History
75
+ - Relevant family history mentioned in the transcript.
76
+
77
+ 7. Social History
78
+ - Occupation, tobacco/alcohol/drug use, exercise, living situation if mentioned in the transcript.
79
+
80
+ 8. Allergy History
81
+ - Drug, food, or environmental allergies and reactions, if mentioned in the transcript.
82
+
83
+ 9. Medication History
84
+ - List medications the patient is already taking. Do not include any new or proposed drugs in this section.
85
+
86
+ 10. Dietary History
87
+ - If unrelated, write "Not applicable"; otherwise, summarize the diet pattern.
88
+
89
+ 11. Review of Systems
90
+ - Head-to-toe, alphabetically ordered bullet points; include both positives and pertinent negatives as mentioned in the transcript.
91
+
92
+ 12. Physical Exam Findings
93
+ - Vital Signs (BP, HR, RR, Temp, SpO₂, HT, WT, BMI) if mentioned in the transcript.
94
+ - Structured by system: General, HEENT, Cardiovascular, Respiratory, Abdomen, Neurological, Musculoskeletal, Skin, Psychiatric—as mentioned in the transcript.
95
+
96
+ 13. Labs and Imaging
97
+ - Summarize labs and imaging results.
98
+
99
+ 14. ASSESSMENT
100
+ - Provide a brief summary of the clinical assessment or diagnosis based on the information in the transcript.
101
+
102
+ 15. PLAN
103
+ - Outline the proposed management plan, including treatments, medications, follow-up, and patient instructions as discussed.
104
+
105
+ Please use only the information present in the transcript. If an information is not mentioned or not applicable, state "Not applicable." Format each section clearly with its heading.
106
+ """
107
+
108
+ def generate_structured_note_onnx(transcript):
109
+ message = [
110
+ {"role": "system", "content": SYSTEM_PROMPT},
111
+ {"role": "user", "content": f"<START_TRANSCRIPT>\n{transcript}\n<END_TRANSCRIPT>\n"},
112
+ ]
113
+
114
+ # Apply chat template
115
+ inputs = tokenizer.apply_chat_template(
116
+ message,
117
+ tokenize=True,
118
+ add_generation_prompt=True,
119
+ return_tensors="pt",
120
+ )
121
+
122
+ # Convert to numpy for ONNX inference
123
+ input_ids = inputs.numpy()
124
+
125
+ # Run inference with ONNX Runtime
126
+ outputs = session.run(
127
+ None,
128
+ {"input_ids": input_ids}
129
+ )
130
+
131
+ # Process outputs and generate text
132
+ # Note: This is a simplified example. You may need to implement proper text generation logic
133
+
134
+ return "Generated structured medical note..."
135
+
136
+ # Example usage
137
+ transcript = "Patient is a 45-year-old male presenting with chest pain for the past 2 days..."
138
+ note = generate_structured_note_onnx(transcript)
139
+ print("\n--- Generated Response ---")
140
+ print(note)
141
+ print("---------------------------")
142
+ ```
143
+
144
+ ## Alternative Usage with Transformers (Original Model)
145
+
146
+ If you prefer to use the original model instead of the ONNX version:
147
+
148
+ ```python
149
+ from transformers import AutoModelForCausalLM, AutoTokenizer
150
+
151
+ model_name = "GetSoloTech/Llama3.2-Medical-Notes-1B"
152
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
153
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
154
+
155
+ def generate_structured_note(transcript):
156
+ message = [
157
+ {"role": "system", "content": SYSTEM_PROMPT},
158
+ {"role": "user", "content": f"<START_TRANSCRIPT>\n{transcript}\n<END_TRANSCRIPT>\n"},
159
+ ]
160
+
161
+ inputs = tokenizer.apply_chat_template(
162
+ message,
163
+ tokenize=True,
164
+ add_generation_prompt=True,
165
+ return_tensors="pt",
166
+ ).to(model.device)
167
+
168
+ outputs = model.generate(
169
+ input_ids=inputs,
170
+ max_new_tokens=2048,
171
+ temperature=0.2,
172
+ top_p=0.85,
173
+ min_p=0.1,
174
+ top_k=20,
175
+ do_sample=True,
176
+ eos_token_id=tokenizer.eos_token_id,
177
+ use_cache=True,
178
+ )
179
+
180
+ input_token_len = len(inputs[0])
181
+ generated_tokens = outputs[:, input_token_len:]
182
+ note = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
183
+ if "<START_NOTES>" in note:
184
+ note = note.split("<START_NOTES>")[-1].strip()
185
+ if "<END_NOTES>" in note:
186
+ note = note.split("<END_NOTES>")[0].strip()
187
+ return note
188
+ ```
189
+
190
+ ## Performance Benefits
191
+
192
+ The ONNX version provides:
193
+ - **Faster inference** through optimized runtime
194
+ - **Reduced memory footprint** through quantization
195
+ - **Cross-platform compatibility** for deployment
196
+ - **Production-ready** inference capabilities
197
+
198
+ ## Requirements
199
+
200
+ - `onnxruntime` for ONNX inference
201
+ - `transformers` for tokenization
202
+ - `numpy` for array operations