Text Generation
Transformers
Safetensors
English
llama
conversational
text-generation-inference
zeeshaan-ai commited on
Commit
2ba10a0
·
verified ·
1 Parent(s): bc64d05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -134
README.md CHANGED
@@ -1,135 +1,135 @@
1
- ---
2
- datasets:
3
- - starfishdata/playground_endocronology_notes_1500
4
- metrics:
5
- - bertscore
6
- - bleurt
7
- - rouge
8
- library_name: transformers
9
- base_model:
10
- - unsloth/Llama-3.2-1B-Instruct
11
- license: apache-2.0
12
- language:
13
- - en
14
- ---
15
-
16
- ## Model Details
17
- * **Base Model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
18
- * **Fine-tuning Method:** PEFT (Parameter-Efficient Fine-Tuning) using LoRA.
19
- * **Training Framework:** Unsloth library for accelerated fine-tuning and merging.
20
- * **Task:** Text Generation (specifically, generating structured SOAP notes).
21
-
22
- ## Paper
23
- https://arxiv.org/abs/2507.03033
24
-
25
- https://www.medrxiv.org/content/10.1101/2025.07.01.25330679v1
26
-
27
- ## Intended Use
28
- Input: Free-text medical transcripts (doctor-patient conversations or dictated notes).
29
-
30
- Output: Structured medical notes with clearly defined sections (Demographics, Presenting Illness, History, etc.).
31
-
32
-
33
- ```python
34
-
35
- from transformers import AutoModelForCausalLM, AutoTokenizer
36
-
37
- model_name = "OnDeviceMedNotes/Medical_Summary_Notes"
38
- tokenizer = AutoTokenizer.from_pretrained(model_name)
39
- model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
40
-
41
-
42
- SYSTEM_PROMPT = """Convert the following medical transcript to a structured medical note.
43
-
44
- Use these sections in this order:
45
-
46
- 1. Demographics
47
- - Name, Age, Sex, DOB
48
-
49
- 2. Presenting Illness
50
- - Bullet point statements of the main problem and duration.
51
-
52
- 3. History of Presenting Illness
53
- - Chronological narrative: symptom onset, progression, modifiers, associated factors.
54
-
55
- 4. Past Medical History
56
- - List chronic illnesses and past medical diagnoses mentioned in the transcript. Do not include surgeries.
57
-
58
- 5. Surgical History
59
- - List prior surgeries with year if known, as mentioned in the transcript.
60
-
61
- 6. Family History
62
- - Relevant family history mentioned in the transcript.
63
-
64
- 7. Social History
65
- - Occupation, tobacco/alcohol/drug use, exercise, living situation if mentioned in the transcript.
66
-
67
- 8. Allergy History
68
- - Drug, food, or environmental allergies and reactions, if mentioned in the transcript.
69
-
70
- 9. Medication History
71
- - List medications the patient is already taking. Do not include any new or proposed drugs in this section.
72
-
73
- 10. Dietary History
74
- - If unrelated, write “Not applicable”; otherwise, summarize the diet pattern.
75
-
76
- 11. Review of Systems
77
- - Head-to-toe, alphabetically ordered bullet points; include both positives and pertinent negatives as mentioned in the transcript.
78
-
79
- 12. Physical Exam Findings
80
- - Vital Signs (BP, HR, RR, Temp, SpO₂, HT, WT, BMI) if mentioned in the transcript.
81
- - Structured by system: General, HEENT, Cardiovascular, Respiratory, Abdomen, Neurological, Musculoskeletal, Skin, Psychiatric—as mentioned in the transcript.
82
-
83
- 13. Labs and Imaging
84
- - Summarize labs and imaging results.
85
-
86
- 14. ASSESSMENT
87
- - Provide a brief summary of the clinical assessment or diagnosis based on the information in the transcript.
88
-
89
- 15. PLAN
90
- - Outline the proposed management plan, including treatments, medications, follow-up, and patient instructions as discussed.
91
-
92
- Please use only the information present in the transcript. If an information is not mentioned or not applicable, state “Not applicable.” Format each section clearly with its heading.
93
- """
94
-
95
- def generate_structured_note(transcript):
96
- message = [
97
- {"role": "system", "content": SYSTEM_PROMPT},
98
- {"role": "user", "content": f"<START_TRANSCRIPT>\n{transcript}\n<END_TRANSCRIPT>\n"},
99
- ]
100
-
101
- inputs = tokenizer.apply_chat_template(
102
- message,
103
- tokenize=True,
104
- add_generation_prompt=True,
105
- return_tensors="pt",
106
- ).to(model.device)
107
-
108
- outputs = model.generate(
109
- input_ids=inputs,
110
- max_new_tokens=2048,
111
- temperature=0.2,
112
- top_p=0.85,
113
- min_p=0.1,
114
- top_k=20,
115
- do_sample=True,
116
- eos_token_id=tokenizer.eos_token_id,
117
- use_cache=True,
118
- )
119
-
120
- input_token_len = len(inputs[0])
121
- generated_tokens = outputs[:, input_token_len:]
122
- note = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
123
- if "<START_NOTES>" in note:
124
- note = note.split("<START_NOTES>")[-1].strip()
125
- if "<END_NOTES>" in note:
126
- note = note.split("<END_NOTES>")[0].strip()
127
- return note
128
-
129
- # Example usage
130
- transcript = "Patient is a 45-year-old male presenting with..."
131
- note = generate_structured_note(transcript)
132
- print("\n--- Generated Response ---")
133
- print(note)
134
- print("---------------------------")
135
  ```
 
1
+ ---
2
+ datasets:
3
+ - starfishdata/playground_endocronology_notes_1500
4
+ metrics:
5
+ - bertscore
6
+ - bleurt
7
+ - rouge
8
+ library_name: transformers
9
+ base_model:
10
+ - meta-llama/Llama-3.2-1B-Instruct
11
+ license: apache-2.0
12
+ language:
13
+ - en
14
+ ---
15
+
16
+ ## Model Details
17
+ * **Base Model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
18
+ * **Fine-tuning Method:** PEFT (Parameter-Efficient Fine-Tuning) using LoRA.
19
+ * **Training Framework:** Unsloth library for accelerated fine-tuning and merging.
20
+ * **Task:** Text Generation (specifically, generating structured SOAP notes).
21
+
22
+ ## Paper
23
+ https://arxiv.org/abs/2507.03033
24
+
25
+ https://www.medrxiv.org/content/10.1101/2025.07.01.25330679v1
26
+
27
+ ## Intended Use
28
+ Input: Free-text medical transcripts (doctor-patient conversations or dictated notes).
29
+
30
+ Output: Structured medical notes with clearly defined sections (Demographics, Presenting Illness, History, etc.).
31
+
32
+
33
+ ```python
34
+
35
+ from transformers import AutoModelForCausalLM, AutoTokenizer
36
+
37
+ model_name = "GetSoloTech/Llama3.2-Medical-Notes-1B"
38
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
39
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
40
+
41
+
42
+ SYSTEM_PROMPT = """Convert the following medical transcript to a structured medical note.
43
+
44
+ Use these sections in this order:
45
+
46
+ 1. Demographics
47
+ - Name, Age, Sex, DOB
48
+
49
+ 2. Presenting Illness
50
+ - Bullet point statements of the main problem and duration.
51
+
52
+ 3. History of Presenting Illness
53
+ - Chronological narrative: symptom onset, progression, modifiers, associated factors.
54
+
55
+ 4. Past Medical History
56
+ - List chronic illnesses and past medical diagnoses mentioned in the transcript. Do not include surgeries.
57
+
58
+ 5. Surgical History
59
+ - List prior surgeries with year if known, as mentioned in the transcript.
60
+
61
+ 6. Family History
62
+ - Relevant family history mentioned in the transcript.
63
+
64
+ 7. Social History
65
+ - Occupation, tobacco/alcohol/drug use, exercise, living situation if mentioned in the transcript.
66
+
67
+ 8. Allergy History
68
+ - Drug, food, or environmental allergies and reactions, if mentioned in the transcript.
69
+
70
+ 9. Medication History
71
+ - List medications the patient is already taking. Do not include any new or proposed drugs in this section.
72
+
73
+ 10. Dietary History
74
+ - If unrelated, write “Not applicable”; otherwise, summarize the diet pattern.
75
+
76
+ 11. Review of Systems
77
+ - Head-to-toe, alphabetically ordered bullet points; include both positives and pertinent negatives as mentioned in the transcript.
78
+
79
+ 12. Physical Exam Findings
80
+ - Vital Signs (BP, HR, RR, Temp, SpO₂, HT, WT, BMI) if mentioned in the transcript.
81
+ - Structured by system: General, HEENT, Cardiovascular, Respiratory, Abdomen, Neurological, Musculoskeletal, Skin, Psychiatric—as mentioned in the transcript.
82
+
83
+ 13. Labs and Imaging
84
+ - Summarize labs and imaging results.
85
+
86
+ 14. ASSESSMENT
87
+ - Provide a brief summary of the clinical assessment or diagnosis based on the information in the transcript.
88
+
89
+ 15. PLAN
90
+ - Outline the proposed management plan, including treatments, medications, follow-up, and patient instructions as discussed.
91
+
92
+ Please use only the information present in the transcript. If an information is not mentioned or not applicable, state “Not applicable.” Format each section clearly with its heading.
93
+ """
94
+
95
+ def generate_structured_note(transcript):
96
+ message = [
97
+ {"role": "system", "content": SYSTEM_PROMPT},
98
+ {"role": "user", "content": f"<START_TRANSCRIPT>\n{transcript}\n<END_TRANSCRIPT>\n"},
99
+ ]
100
+
101
+ inputs = tokenizer.apply_chat_template(
102
+ message,
103
+ tokenize=True,
104
+ add_generation_prompt=True,
105
+ return_tensors="pt",
106
+ ).to(model.device)
107
+
108
+ outputs = model.generate(
109
+ input_ids=inputs,
110
+ max_new_tokens=2048,
111
+ temperature=0.2,
112
+ top_p=0.85,
113
+ min_p=0.1,
114
+ top_k=20,
115
+ do_sample=True,
116
+ eos_token_id=tokenizer.eos_token_id,
117
+ use_cache=True,
118
+ )
119
+
120
+ input_token_len = len(inputs[0])
121
+ generated_tokens = outputs[:, input_token_len:]
122
+ note = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
123
+ if "<START_NOTES>" in note:
124
+ note = note.split("<START_NOTES>")[-1].strip()
125
+ if "<END_NOTES>" in note:
126
+ note = note.split("<END_NOTES>")[0].strip()
127
+ return note
128
+
129
+ # Example usage
130
+ transcript = "Patient is a 45-year-old male presenting with..."
131
+ note = generate_structured_note(transcript)
132
+ print("\n--- Generated Response ---")
133
+ print(note)
134
+ print("---------------------------")
135
  ```