File size: 6,561 Bytes
f882e08 9b9b617 db2bd89 9b9b617 db2bd89 d30e894 9b9b617 831ecbd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
---
datasets:
- starfishdata/endocrinology_structured_notes_1500
language:
- en
metrics:
- bertscore
- rouge
- bleurt
base_model:
- GetSoloTech/Llama3.2-Medical-Notes-1B
tags:
- medical
- summary
- endocronology
---
# Llama3.2-Medical-Notes-1B-ONNX
This is the ONNX quantized version of the [Llama3.2-Medical-Notes-1B](https://huggingface.co/GetSoloTech/Llama3.2-Medical-Notes-1B) model, optimized for efficient inference and deployment.
## Model Details
- **Base Model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
- **Fine-tuning Method:** PEFT (Parameter-Efficient Fine-Tuning) using LoRA
- **Training Framework:** Unsloth library for accelerated fine-tuning and merging
- **Quantization:** ONNX format for optimized inference
- **Task:** Text Generation (specifically, generating structured SOAP notes)
## Paper
- [arXiv: 2507.03033](https://arxiv.org/abs/2507.03033)
- [medRxiv: 10.1101/2025.07.01.25330679v1](https://www.medrxiv.org/content/10.1101/2025.07.01.25330679v1)
## Intended Use
**Input:** Free-text medical transcripts (doctor-patient conversations or dictated notes).
**Output:** Structured medical notes with clearly defined sections (Demographics, Presenting Illness, History, etc.).
## Usage with ONNX Runtime
```python
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
# Load the ONNX model
model_name = "GetSoloTech/Llama3.2-Medical-Notes-1B-ONNX"
tokenizer = AutoTokenizer.from_pretrained("GetSoloTech/Llama3.2-Medical-Notes-1B")
# Initialize ONNX Runtime session
session = ort.InferenceSession(onnx_file_path)
SYSTEM_PROMPT = """Convert the following medical transcript to a structured medical note.
Use these sections in this order:
1. Demographics
- Name, Age, Sex, DOB
2. Presenting Illness
- Bullet point statements of the main problem and duration.
3. History of Presenting Illness
- Chronological narrative: symptom onset, progression, modifiers, associated factors.
4. Past Medical History
- List chronic illnesses and past medical diagnoses mentioned in the transcript. Do not include surgeries.
5. Surgical History
- List prior surgeries with year if known, as mentioned in the transcript.
6. Family History
- Relevant family history mentioned in the transcript.
7. Social History
- Occupation, tobacco/alcohol/drug use, exercise, living situation if mentioned in the transcript.
8. Allergy History
- Drug, food, or environmental allergies and reactions, if mentioned in the transcript.
9. Medication History
- List medications the patient is already taking. Do not include any new or proposed drugs in this section.
10. Dietary History
- If unrelated, write "Not applicable"; otherwise, summarize the diet pattern.
11. Review of Systems
- Head-to-toe, alphabetically ordered bullet points; include both positives and pertinent negatives as mentioned in the transcript.
12. Physical Exam Findings
- Vital Signs (BP, HR, RR, Temp, SpO₂, HT, WT, BMI) if mentioned in the transcript.
- Structured by system: General, HEENT, Cardiovascular, Respiratory, Abdomen, Neurological, Musculoskeletal, Skin, Psychiatric—as mentioned in the transcript.
13. Labs and Imaging
- Summarize labs and imaging results.
14. ASSESSMENT
- Provide a brief summary of the clinical assessment or diagnosis based on the information in the transcript.
15. PLAN
- Outline the proposed management plan, including treatments, medications, follow-up, and patient instructions as discussed.
Please use only the information present in the transcript. If an information is not mentioned or not applicable, state "Not applicable." Format each section clearly with its heading.
"""
def generate_structured_note_onnx(transcript):
message = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"<START_TRANSCRIPT>\n{transcript}\n<END_TRANSCRIPT>\n"},
]
# Apply chat template
inputs = tokenizer.apply_chat_template(
message,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
)
# Convert to numpy for ONNX inference
input_ids = inputs.numpy()
# Run inference with ONNX Runtime
outputs = session.run(
None,
{"input_ids": input_ids}
)
# Process outputs and generate text
# Note: This is a simplified example. You may need to implement proper text generation logic
return "Generated structured medical note..."
# Example usage
transcript = "Patient is a 45-year-old male presenting with chest pain for the past 2 days..."
note = generate_structured_note_onnx(transcript)
print("\n--- Generated Response ---")
print(note)
print("---------------------------")
```
## Alternative Usage with Transformers (Original Model)
If you prefer to use the original model instead of the ONNX version:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "GetSoloTech/Llama3.2-Medical-Notes-1B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
def generate_structured_note(transcript):
message = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"<START_TRANSCRIPT>\n{transcript}\n<END_TRANSCRIPT>\n"},
]
inputs = tokenizer.apply_chat_template(
message,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(
input_ids=inputs,
max_new_tokens=2048,
temperature=0.2,
top_p=0.85,
min_p=0.1,
top_k=20,
do_sample=True,
eos_token_id=tokenizer.eos_token_id,
use_cache=True,
)
input_token_len = len(inputs[0])
generated_tokens = outputs[:, input_token_len:]
note = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
if "<START_NOTES>" in note:
note = note.split("<START_NOTES>")[-1].strip()
if "<END_NOTES>" in note:
note = note.split("<END_NOTES>")[0].strip()
return note
```
## Performance Benefits
The ONNX version provides:
- **Faster inference** through optimized runtime
- **Reduced memory footprint** through quantization
- **Cross-platform compatibility** for deployment
- **Production-ready** inference capabilities
## Requirements
- `onnxruntime` for ONNX inference
- `transformers` for tokenization
- `numpy` for array operations
|