mistral-7b-instruct-v0.3-mimic4-adapt-multilabel-classify / classification_log_2025-06-16_21-42-41.log

Trained classifier model on MIMIC-IV

2db39ec verified 3 months ago

117 kB

	2025-06-16 21:42:41,136 - INFO - ================================================================================ - [multilabel_classify.py:103:log_section]
	2025-06-16 21:42:41,136 - INFO - = 📌 INITIALIZING TRAINING ENVIRONMENT = - [multilabel_classify.py:104:log_section]
	2025-06-16 21:42:41,136 - INFO - ================================================================================ - [multilabel_classify.py:107:log_section]
	2025-06-16 21:42:41,136 - INFO - 🚀 Setting up data paths and environment variables... - [multilabel_classify.py:3940:main]
	2025-06-16 21:42:41,137 - INFO - 📂 Using output directory: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:3946:main]
	2025-06-16 21:42:41,137 - INFO - 🛠️ Command-line Arguments: - [multilabel_classify.py:371:print_args]
	2025-06-16 21:42:41,137 - INFO -
	🔹 output_dir: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b
	🔹 source_url: XURLs.MIMIC4_DEMO
	🔹 data: mimic4_icd10_full
	🔹 logfile: classification_log
	🔹 base_dir: ../tmp/MIMIC4_DEMO
	🔹 hub_model_id: deb101/mistral-7b-instruct-v0.3-mimic4-adapt
	🔹 model_name: mistralai/Mistral-7B-Instruct-v0.3
	🔹 max_length: 512
	🔹 do_fresh_training: True
	🔹 load_from_checkpoint: False
	🔹 task: multilabel-classify
	🔹 num_train_epochs: 5
	🔹 per_device_train_batch_size: 8
	🔹 per_device_eval_batch_size: 8
	🔹 metric_for_best_model: precision_at_15
	🔹 learning_rate: 0.0001
	🔹 final_lr_scheduling: 1e-06
	🔹 warmup_steps: 500
	🔹 logfile_path: ../tmp/logs/classification_log_2025-06-16_21-42-41.log
	🔹 source: /home/ubuntu/.xcube/data/mimic4_demo - [multilabel_classify.py:372:print_args]
	2025-06-16 21:42:41,137 - INFO - ➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖ - [multilabel_classify.py:373:print_args]
	2025-06-16 21:42:41,148 - INFO -
	🚀 Quick Git Info: 📁 xcube \| 🌿 plant \| 🔍 9a164a6 \| 👤 Debjyoti Saha Roy \| 🟢 STAGED (1) \| 🔬 git show 9a164a6 - [multilabel_classify.py:3952:main]
	2025-06-16 21:42:41,148 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section]
	2025-06-16 21:42:41,148 - INFO - + ✨ LOADING DATASETS + - [multilabel_classify.py:104:log_section]
	2025-06-16 21:42:41,148 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section]
	2025-06-16 21:42:41,148 - INFO - 📊 Loading main datasets.... - [multilabel_classify.py:3955:main]
	2025-06-16 21:42:49,755 - INFO - 🔍 Total unique labels in dataset: 7942 - [multilabel_classify.py:3731:sample_df_with_full_label_coverage]
	2025-06-16 21:42:49,768 - INFO - 🧪 Attempt 1: Sampled 122 rows covering 863 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage]
	2025-06-16 21:42:49,777 - INFO - 🧪 Attempt 2: Sampled 122 rows covering 816 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage]
	2025-06-16 21:42:49,786 - INFO - 🧪 Attempt 3: Sampled 122 rows covering 885 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage]
	2025-06-16 21:42:49,795 - INFO - 🧪 Attempt 4: Sampled 122 rows covering 828 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage]
	2025-06-16 21:42:49,804 - INFO - 🧪 Attempt 5: Sampled 122 rows covering 879 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage]
	2025-06-16 21:42:49,813 - INFO - 🧪 Attempt 6: Sampled 122 rows covering 852 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage]
	2025-06-16 21:42:49,821 - INFO - 🧪 Attempt 7: Sampled 122 rows covering 838 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage]
	2025-06-16 21:42:49,831 - INFO - 🧪 Attempt 8: Sampled 122 rows covering 851 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage]
	2025-06-16 21:42:49,839 - INFO - 🧪 Attempt 9: Sampled 122 rows covering 825 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage]
	2025-06-16 21:42:49,848 - INFO - 🧪 Attempt 10: Sampled 122 rows covering 833 labels. - [multilabel_classify.py:3745:sample_df_with_full_label_coverage]
	2025-06-16 21:42:49,852 - INFO - 🛠️ Fixing missing labels: 7109 remaining... - [multilabel_classify.py:3778:sample_df_with_full_label_coverage]
	2025-06-16 21:46:19,271 - INFO - ✅ Added 1648 rows to achieve full label coverage. - [multilabel_classify.py:3810:sample_df_with_full_label_coverage]
	2025-06-16 21:46:19,274 - INFO - 📊 Final total labels: 7942 - [multilabel_classify.py:3813:sample_df_with_full_label_coverage]
	2025-06-16 21:46:19,274 - INFO - ✅ Final row count: 1770 (Valid: 420, Not-valid: 1350) - [multilabel_classify.py:3821:sample_df_with_full_label_coverage]
	2025-06-16 21:46:20,015 - INFO - ******************************************************************************** - [multilabel_classify.py:103:log_section]
	2025-06-16 21:46:20,015 - INFO - * 🌟 STARTING MULTI_LABEL CLASSIFICATION MODEL TRAINING * - [multilabel_classify.py:104:log_section]
	2025-06-16 21:46:20,015 - INFO - ******************************************************************************** - [multilabel_classify.py:107:log_section]
	2025-06-16 21:46:20,015 - INFO - 🔐 Loaded authentication token from environment - [multilabel_classify.py:3982:main]
	2025-06-16 21:46:20,015 - INFO - 🏷️ Hub Model ID for this Classification task: deb101/mistral-7b-instruct-v0.3-mimic4-adapt-multilabel-classify - [multilabel_classify.py:3986:main]
	2025-06-16 21:46:20,016 - INFO - -------------------------------------------------------------------------------- - [multilabel_classify.py:103:log_section]
	2025-06-16 21:46:20,016 - INFO - - 📋 MODEL EXISTENCE CHECK - - [multilabel_classify.py:104:log_section]
	2025-06-16 21:46:20,016 - INFO - -------------------------------------------------------------------------------- - [multilabel_classify.py:107:log_section]
	2025-06-16 21:46:20,016 - INFO - 🔍 Checking model existence locally and on Hugging Face Hub... - [multilabel_classify.py:3846:check_model_existence]
	2025-06-16 21:46:20,016 - INFO - ✅ Model exists locally at: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:3851:check_model_existence]
	2025-06-16 21:46:20,070 - INFO - ✅ Model exists on Hugging Face Hub with ID: deb101/mistral-7b-instruct-v0.3-mimic4-adapt-multilabel-classify - [multilabel_classify.py:3865:check_model_existence]
	2025-06-16 21:46:20,070 - INFO - 📁 Model exists either locally or on Hub - [multilabel_classify.py:3891:check_model_existence]
	2025-06-16 21:46:20,070 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section]
	2025-06-16 21:46:20,070 - INFO - + ✨ STARTING FRESH TRAINING + - [multilabel_classify.py:104:log_section]
	2025-06-16 21:46:20,070 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section]
	2025-06-16 21:46:20,070 - INFO - 🔄 Starting fresh training (either forced or model not found)... - [multilabel_classify.py:3999:main]
	2025-06-16 21:46:20,085 - WARNING - Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured. - [_login.py:415:_login]
	2025-06-16 21:46:20,085 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section]
	2025-06-16 21:46:20,085 - INFO - + ✨ LOADING BASE MODEL + - [multilabel_classify.py:104:log_section]
	2025-06-16 21:46:20,085 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section]
	2025-06-16 21:46:20,085 - INFO - 📥 Loading pretrained model and tokenizer... - [multilabel_classify.py:4031:main]
	2025-06-16 21:46:20,085 - INFO - 🚀 Starting model and tokenizer loading process... - [multilabel_classify.py:1603:load_base_model_and_tokenizer]
	2025-06-16 21:46:20,086 - INFO - 📊 Quantization config: 4-bit, nf4, double_quant, bfloat16 - [multilabel_classify.py:1612:load_base_model_and_tokenizer]
	2025-06-16 21:46:20,086 - INFO - 🔤 Loading tokenizer for model: deb101/mistral-7b-instruct-v0.3-mimic4-adapt... - [multilabel_classify.py:1616:load_base_model_and_tokenizer]
	2025-06-16 21:46:20,476 - INFO - 🔍 Checking if deb101/mistral-7b-instruct-v0.3-mimic4-adapt is a PEFT model... - [multilabel_classify.py:1627:load_base_model_and_tokenizer]
	2025-06-16 21:46:20,498 - INFO - ✅ Detected PEFT model. Base model: mistralai/Mistral-7B-Instruct-v0.3 - [multilabel_classify.py:1631:load_base_model_and_tokenizer]
	2025-06-16 21:46:20,498 - INFO - 🔍 Loading model configuration for mistralai/Mistral-7B-Instruct-v0.3... - [multilabel_classify.py:1639:load_base_model_and_tokenizer]
	2025-06-16 21:46:20,521 - INFO - Model type: mistral, Architectures: ['MistralForCausalLM'] - [multilabel_classify.py:1654:load_base_model_and_tokenizer]
	2025-06-16 21:46:20,521 - INFO - 🧠 Loading base model: mistralai/Mistral-7B-Instruct-v0.3... - [multilabel_classify.py:1722:load_base_model_and_tokenizer]
	2025-06-16 21:46:21,030 - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). - [modeling.py:991:get_balanced_memory]
	2025-06-16 21:46:26,310 - INFO - 🧩 Loading PEFT adapters for deb101/mistral-7b-instruct-v0.3-mimic4-adapt... - [multilabel_classify.py:1742:load_base_model_and_tokenizer]
	2025-06-16 21:46:26,561 - INFO - 🔧 Before enabling PEFT adapters - [multilabel_classify.py:1744:load_base_model_and_tokenizer]
	2025-06-16 21:46:26,563 - INFO - 📊 trainable params: 0 \|\| all params: 7,254,839,296 \|\| trainable%: 0.0000 - [multilabel_classify.py:162:log_print_output]
	2025-06-16 21:46:26,567 - INFO - Enabled gradients for parameters: ['base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight', 'base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight', 'base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight', 'base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight', 'base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight'] - [multilabel_classify.py:1754:load_base_model_and_tokenizer]
	2025-06-16 21:46:26,567 - INFO - 🔧 After enabling PEFT adapters - [multilabel_classify.py:1755:load_base_model_and_tokenizer]
	2025-06-16 21:46:26,569 - INFO - 📊 trainable params: 6,815,744 \|\| all params: 7,254,839,296 \|\| trainable%: 0.0939 - [multilabel_classify.py:162:log_print_output]
	2025-06-16 21:46:26,570 - INFO - ✅ Model and tokenizer successfully loaded! - [multilabel_classify.py:1793:load_base_model_and_tokenizer]
	2025-06-16 21:46:26,570 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section]
	2025-06-16 21:46:26,570 - INFO - + ✨ DATA PREPROCESSING + - [multilabel_classify.py:104:log_section]
	2025-06-16 21:46:26,570 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section]
	2025-06-16 21:46:26,570 - INFO - 🔄 Loading and preprocessing training data... - [multilabel_classify.py:4041:main]
	2025-06-16 21:46:26,747 - INFO - Total number of labels: 7942 - [multilabel_classify.py:1196:preprocess_data]
	2025-06-16 21:46:26,747 - INFO - Rare labels (freq < 50): 7817 - [multilabel_classify.py:1197:preprocess_data]
	2025-06-16 21:46:26,747 - INFO - Not rare labels (freq >= 50): 125 - [multilabel_classify.py:1198:preprocess_data]
	2025-06-16 21:46:26,747 - INFO - Label partitions and classes saved to ../tmp/MIMIC4_DEMO/labels_partition.json - [multilabel_classify.py:1199:preprocess_data]
	2025-06-16 21:47:24,136 - INFO - The size of training set: 8393 - [multilabel_classify.py:1295:preprocess_data]
	2025-06-16 21:47:24,136 - INFO - The size of Evaluation set: 2528 - [multilabel_classify.py:1296:preprocess_data]
	2025-06-16 21:47:24,538 - INFO - Number of unique ICD-10 codes: 7942 - [multilabel_classify.py:4047:main]
	2025-06-16 21:47:24,541 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section]
	2025-06-16 21:47:24,541 - INFO - + ✨ MODEL INITIALIZATION + - [multilabel_classify.py:104:log_section]
	2025-06-16 21:47:24,541 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section]
	2025-06-16 21:47:24,541 - INFO - 🧠 Initializing custom L2R model for outputting per-token relevance scores per ICD-10 codes. - [multilabel_classify.py:4050:main]
	2025-06-16 21:47:24,541 - INFO - 🏥📊 Creating MultilabelICDClassifier - Standard multilabel medical classifier! 🔬💫 - [multilabel_classify.py:884:define_model]
	2025-06-16 21:47:24,541 - INFO - Will now start to create Multilabel-Classification Model from the base model - [multilabel_classify.py:567:__init__]
	2025-06-16 21:47:24,545 - INFO - 📊 trainable params: 6,815,744 \|\| all params: 3,765,178,368 \|\| trainable%: 0.1810 - [utils.py:476:compute_trainable_params]
	2025-06-16 21:47:26,261 - INFO - Creating the Multi-Label Classification Model from base model mistralai/Mistral-7B-Instruct-v0.3 completed!!! - [multilabel_classify.py:609:__init__]
	2025-06-16 21:47:26,265 - INFO - 📊 trainable params: 171,532,417 \|\| all params: 3,929,895,041 \|\| trainable%: 4.3648 - [utils.py:476:compute_trainable_params]
	2025-06-16 21:47:26,265 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section]
	2025-06-16 21:47:26,265 - INFO - + ✨ TRAINING PREPARATION + - [multilabel_classify.py:104:log_section]
	2025-06-16 21:47:26,265 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section]
	2025-06-16 21:47:26,265 - INFO - ⚙️ Preparing training components and optimizers... - [multilabel_classify.py:4057:main]
	2025-06-16 21:47:26,349 - INFO - 🖥️ Device: NVIDIA GH200 480GB - [multilabel_classify.py:1043:log_training_configuration]
	2025-06-16 21:47:26,349 - INFO - 🔋 CUDA Available: True - [multilabel_classify.py:1046:log_training_configuration]
	2025-06-16 21:47:26,349 - INFO - 💾 CUDA Device Count: 1 - [multilabel_classify.py:1047:log_training_configuration]
	2025-06-16 21:47:26,351 - INFO -
	📋 Training Configuration 📋
	+----------+-----------------------------+------------------------------------------------------------------+
	\| 🌟 Emoji \| 🏷️ Parameter \| 📊 Value \|
	+----------+-----------------------------+------------------------------------------------------------------+
	\| 📁 \| Output Directory \| ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b \|
	\| 🔁 \| Training Epochs \| 5 \|
	\| 🏋️ \| Train Batch Size \| 8 \|
	\| 🔍 \| Eval Batch Size \| 8 \|
	\| 📊 \| Gradient Accumulation Steps \| 4 \|
	\| 🚀 \| Learning Rate \| 0.0001 \|
	\| 🌅 \| Warmup Steps \| 500 \|
	\| 💾 \| Save Strategy \| epoch \|
	\| 💾 \| Save Total Limit \| 10 \|
	\| 📊 \| Evaluation Strategy \| epoch \|
	\| 🎯 \| Best Model Metric \| precision_at_15 \|
	\| 📝 \| Logging Strategy \| steps (every 10 steps) \|
	\| 🌐 \| Push to Hub \| True \|
	\| 🌐 \| Hub Model ID \| deb101/mistral-7b-instruct-v0.3-mimic4-adapt-multilabel-classify \|
	\| 🔢 \| Steps per Epoch \| 262 \|
	\| 🔢 \| Total Training Steps \| 1310 \|
	\| 🔢 \| Evaluation Steps \| 316 \|
	\| 📊 \| Training Dataset Size \| 8393 samples 🏋️ \|
	\| 📊 \| Evaluation Dataset Size \| 2528 samples 🔍 \|
	+----------+-----------------------------+------------------------------------------------------------------+ - [multilabel_classify.py:1035:log_training_args]
	2025-06-16 21:47:26,351 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section]
	2025-06-16 21:47:26,351 - INFO - + ✨ MODEL TRAINING + - [multilabel_classify.py:104:log_section]
	2025-06-16 21:47:26,351 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section]
	2025-06-16 21:47:26,352 - INFO - 🏋️ Starting model training process... - [multilabel_classify.py:4079:main]
	2025-06-16 21:47:26,396 - INFO - We are registering the tokenizer deb101/mistral-7b-instruct-v0.3-mimic4-adapt in Custom Trainer - [multilabel_classify.py:2364:__init__]
	2025-06-16 21:47:26,644 - INFO - 🚀 Starting Training... - [multilabel_classify.py:2018:on_train_begin]
	2025-06-16 21:47:51,185 - INFO -
	[36m🚂 Training Metrics (Step 10) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -1.3281 \|
	+---------------+----------+
	\| grad_norm \| 0.009771 \|
	+---------------+----------+
	\| learning_rate \| 2e-06 \|
	+---------------+----------+
	\| epoch \| 0.038095 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:48:11,800 - INFO -
	[36m🚂 Training Metrics (Step 20) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -1.2681 \|
	+---------------+----------+
	\| grad_norm \| 0.008548 \|
	+---------------+----------+
	\| learning_rate \| 4e-06 \|
	+---------------+----------+
	\| epoch \| 0.07619 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:48:32,431 - INFO -
	[36m🚂 Training Metrics (Step 30) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -1.2844 \|
	+---------------+----------+
	\| grad_norm \| 0.010616 \|
	+---------------+----------+
	\| learning_rate \| 6e-06 \|
	+---------------+----------+
	\| epoch \| 0.114286 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:48:53,066 - INFO -
	[36m🚂 Training Metrics (Step 40) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -1.33 \|
	+---------------+----------+
	\| grad_norm \| 0.020943 \|
	+---------------+----------+
	\| learning_rate \| 8e-06 \|
	+---------------+----------+
	\| epoch \| 0.152381 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:49:13,743 - INFO -
	[36m🚂 Training Metrics (Step 50) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -1.3244 \|
	+---------------+----------+
	\| grad_norm \| 0.069555 \|
	+---------------+----------+
	\| learning_rate \| 1e-05 \|
	+---------------+----------+
	\| epoch \| 0.190476 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:49:34,457 - INFO -
	[36m🚂 Training Metrics (Step 60) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -1.3176 \|
	+---------------+----------+
	\| grad_norm \| 1.1682 \|
	+---------------+----------+
	\| learning_rate \| 1.2e-05 \|
	+---------------+----------+
	\| epoch \| 0.228571 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:49:55,223 - INFO -
	[36m🚂 Training Metrics (Step 70) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.083 \|
	+---------------+----------+
	\| grad_norm \| 1.83967 \|
	+---------------+----------+
	\| learning_rate \| 1.4e-05 \|
	+---------------+----------+
	\| epoch \| 0.266667 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:50:15,943 - INFO -
	[36m🚂 Training Metrics (Step 80) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.3497 \|
	+---------------+----------+
	\| grad_norm \| 0.998332 \|
	+---------------+----------+
	\| learning_rate \| 1.6e-05 \|
	+---------------+----------+
	\| epoch \| 0.304762 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:50:36,662 - INFO -
	[36m🚂 Training Metrics (Step 90) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.4039 \|
	+---------------+----------+
	\| grad_norm \| 1.1678 \|
	+---------------+----------+
	\| learning_rate \| 1.8e-05 \|
	+---------------+----------+
	\| epoch \| 0.342857 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:50:57,346 - INFO -
	[36m🚂 Training Metrics (Step 100) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.5742 \|
	+---------------+----------+
	\| grad_norm \| 2.53676 \|
	+---------------+----------+
	\| learning_rate \| 2e-05 \|
	+---------------+----------+
	\| epoch \| 0.380952 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:51:18,014 - INFO -
	[36m🚂 Training Metrics (Step 110) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.599 \|
	+---------------+----------+
	\| grad_norm \| 1.1263 \|
	+---------------+----------+
	\| learning_rate \| 2.2e-05 \|
	+---------------+----------+
	\| epoch \| 0.419048 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:51:38,668 - INFO -
	[36m🚂 Training Metrics (Step 120) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.7175 \|
	+---------------+----------+
	\| grad_norm \| 1.1186 \|
	+---------------+----------+
	\| learning_rate \| 2.4e-05 \|
	+---------------+----------+
	\| epoch \| 0.457143 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:51:59,288 - INFO -
	[36m🚂 Training Metrics (Step 130) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.6394 \|
	+---------------+----------+
	\| grad_norm \| 1.62559 \|
	+---------------+----------+
	\| learning_rate \| 2.6e-05 \|
	+---------------+----------+
	\| epoch \| 0.495238 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:52:19,922 - INFO -
	[36m🚂 Training Metrics (Step 140) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.7082 \|
	+---------------+----------+
	\| grad_norm \| 0.867737 \|
	+---------------+----------+
	\| learning_rate \| 2.8e-05 \|
	+---------------+----------+
	\| epoch \| 0.533333 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:52:40,541 - INFO -
	[36m🚂 Training Metrics (Step 150) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.4458 \|
	+---------------+----------+
	\| grad_norm \| 1.29707 \|
	+---------------+----------+
	\| learning_rate \| 3e-05 \|
	+---------------+----------+
	\| epoch \| 0.571429 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:53:01,158 - INFO -
	[36m🚂 Training Metrics (Step 160) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.5431 \|
	+---------------+----------+
	\| grad_norm \| 0.911872 \|
	+---------------+----------+
	\| learning_rate \| 3.2e-05 \|
	+---------------+----------+
	\| epoch \| 0.609524 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:53:21,750 - INFO -
	[36m🚂 Training Metrics (Step 170) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.6463 \|
	+---------------+----------+
	\| grad_norm \| 1.06875 \|
	+---------------+----------+
	\| learning_rate \| 3.4e-05 \|
	+---------------+----------+
	\| epoch \| 0.647619 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:53:42,366 - INFO -
	[36m🚂 Training Metrics (Step 180) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.5042 \|
	+---------------+----------+
	\| grad_norm \| 1.2099 \|
	+---------------+----------+
	\| learning_rate \| 3.6e-05 \|
	+---------------+----------+
	\| epoch \| 0.685714 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:54:02,983 - INFO -
	[36m🚂 Training Metrics (Step 190) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.5143 \|
	+---------------+----------+
	\| grad_norm \| 0.903909 \|
	+---------------+----------+
	\| learning_rate \| 3.8e-05 \|
	+---------------+----------+
	\| epoch \| 0.72381 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:54:23,596 - INFO -
	[36m🚂 Training Metrics (Step 200) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.4899 \|
	+---------------+----------+
	\| grad_norm \| 0.870928 \|
	+---------------+----------+
	\| learning_rate \| 4e-05 \|
	+---------------+----------+
	\| epoch \| 0.761905 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:54:44,214 - INFO -
	[36m🚂 Training Metrics (Step 210) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.7119 \|
	+---------------+---------+
	\| grad_norm \| 1.17446 \|
	+---------------+---------+
	\| learning_rate \| 4.2e-05 \|
	+---------------+---------+
	\| epoch \| 0.8 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:55:04,855 - INFO -
	[36m🚂 Training Metrics (Step 220) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.6202 \|
	+---------------+----------+
	\| grad_norm \| 1.01015 \|
	+---------------+----------+
	\| learning_rate \| 4.4e-05 \|
	+---------------+----------+
	\| epoch \| 0.838095 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:55:25,489 - INFO -
	[36m🚂 Training Metrics (Step 230) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.5483 \|
	+---------------+---------+
	\| grad_norm \| 1.28282 \|
	+---------------+---------+
	\| learning_rate \| 4.6e-05 \|
	+---------------+---------+
	\| epoch \| 0.87619 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:55:46,146 - INFO -
	[36m🚂 Training Metrics (Step 240) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.5052 \|
	+---------------+----------+
	\| grad_norm \| 1.70203 \|
	+---------------+----------+
	\| learning_rate \| 4.8e-05 \|
	+---------------+----------+
	\| epoch \| 0.914286 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:56:06,825 - INFO -
	[36m🚂 Training Metrics (Step 250) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.519 \|
	+---------------+----------+
	\| grad_norm \| 5.4329 \|
	+---------------+----------+
	\| learning_rate \| 5e-05 \|
	+---------------+----------+
	\| epoch \| 0.952381 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:56:27,493 - INFO -
	[36m🚂 Training Metrics (Step 260) 🚂
	+---------------+----------+
	\| Metric \| Value \|
	+===============+==========+
	\| loss \| -2.5733 \|
	+---------------+----------+
	\| grad_norm \| 3.50952 \|
	+---------------+----------+
	\| learning_rate \| 5.2e-05 \|
	+---------------+----------+
	\| epoch \| 0.990476 \|
	+---------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 21:56:32,140 - INFO - Removing 'token_type_ids' from eval_dataset as they are not needed. - [multilabel_classify.py:2376:evaluate]
	2025-06-16 22:15:34,916 - INFO -
	[33m🔍 Evaluation Metrics 🔍
	+-------------------------------+----------+
	\| Metric \| Value \|
	+===============================+==========+
	\| eval_f1_micro \| 0.008615 \|
	+-------------------------------+----------+
	\| eval_f1_macro \| 0.005978 \|
	+-------------------------------+----------+
	\| eval_precision_at_5 \| 0.203244 \|
	+-------------------------------+----------+
	\| eval_recall_at_5 \| 0.0452 \|
	+-------------------------------+----------+
	\| eval_precision_at_8 \| 0.197538 \|
	+-------------------------------+----------+
	\| eval_recall_at_8 \| 0.069411 \|
	+-------------------------------+----------+
	\| eval_precision_at_15 \| 0.182648 \|
	+-------------------------------+----------+
	\| eval_recall_at_15 \| 0.11848 \|
	+-------------------------------+----------+
	\| eval_rare_f1_micro \| 0.005097 \|
	+-------------------------------+----------+
	\| eval_rare_f1_macro \| 0.003982 \|
	+-------------------------------+----------+
	\| eval_rare_precision \| 0.002557 \|
	+-------------------------------+----------+
	\| eval_rare_recall \| 0.789408 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_5 \| 0.036946 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_5 \| 0.011236 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_8 \| 0.032931 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_8 \| 0.016217 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_15 \| 0.029008 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_15 \| 0.027004 \|
	+-------------------------------+----------+
	\| eval_not_rare_f1_micro \| 0.135446 \|
	+-------------------------------+----------+
	\| eval_not_rare_f1_macro \| 0.130824 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision \| 0.072642 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall \| 1 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_5 \| 0.201187 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_5 \| 0.118668 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_8 \| 0.196252 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_8 \| 0.184178 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_15 \| 0.180195 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_15 \| 0.311517 \|
	+-------------------------------+----------+
	\| eval_loss \| -2.18078 \|
	+-------------------------------+----------+[0m - [multilabel_classify.py:2231:on_evaluate]
	2025-06-16 22:15:37,017 - INFO - 💾 Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-262 - [multilabel_classify.py:2469:_save]
	2025-06-16 22:15:37,020 - INFO - ⚙️ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-262 - [multilabel_classify.py:2474:_save]
	2025-06-16 22:15:37,021 - INFO - 📋 Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-262:
	+---------+--------------------+------------+
	\| Index \| Saved File \| Size \|
	+=========+====================+============+
	\| 1 \| training_args.bin \| 0.01 MB \|
	+---------+--------------------+------------+
	\| 2 \| optimizer.pt \| 1308.77 MB \|
	+---------+--------------------+------------+
	\| 3 \| model.safetensors \| 4600.97 MB \|
	+---------+--------------------+------------+
	\| 4 \| scaler.pt \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 5 \| config.json \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 6 \| scheduler.pt \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 7 \| trainer_state.json \| 0.01 MB \|
	+---------+--------------------+------------+
	\| 8 \| rng_state.pth \| 0.01 MB \|
	+---------+--------------------+------------+ - [multilabel_classify.py:2491:_save]
	2025-06-16 22:15:58,054 - INFO -
	[36m🚂 Training Metrics (Step 270) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.6944 \|
	+---------------+---------+
	\| grad_norm \| 1.66623 \|
	+---------------+---------+
	\| learning_rate \| 5.4e-05 \|
	+---------------+---------+
	\| epoch \| 1.03048 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:16:18,682 - INFO -
	[36m🚂 Training Metrics (Step 280) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.6468 \|
	+---------------+---------+
	\| grad_norm \| 5.73856 \|
	+---------------+---------+
	\| learning_rate \| 5.6e-05 \|
	+---------------+---------+
	\| epoch \| 1.06857 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:16:39,356 - INFO -
	[36m🚂 Training Metrics (Step 290) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.7772 \|
	+---------------+---------+
	\| grad_norm \| 1.32201 \|
	+---------------+---------+
	\| learning_rate \| 5.8e-05 \|
	+---------------+---------+
	\| epoch \| 1.10667 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:17:00,040 - INFO -
	[36m🚂 Training Metrics (Step 300) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.5104 \|
	+---------------+---------+
	\| grad_norm \| 1.28285 \|
	+---------------+---------+
	\| learning_rate \| 6e-05 \|
	+---------------+---------+
	\| epoch \| 1.14476 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:17:20,738 - INFO -
	[36m🚂 Training Metrics (Step 310) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.6475 \|
	+---------------+---------+
	\| grad_norm \| 2.5773 \|
	+---------------+---------+
	\| learning_rate \| 6.2e-05 \|
	+---------------+---------+
	\| epoch \| 1.18286 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:17:41,412 - INFO -
	[36m🚂 Training Metrics (Step 320) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.5365 \|
	+---------------+---------+
	\| grad_norm \| 3.06585 \|
	+---------------+---------+
	\| learning_rate \| 6.4e-05 \|
	+---------------+---------+
	\| epoch \| 1.22095 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:18:02,294 - INFO -
	[36m🚂 Training Metrics (Step 330) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.6598 \|
	+---------------+---------+
	\| grad_norm \| 2.69829 \|
	+---------------+---------+
	\| learning_rate \| 6.6e-05 \|
	+---------------+---------+
	\| epoch \| 1.25905 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:18:22,986 - INFO -
	[36m🚂 Training Metrics (Step 340) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.6969 \|
	+---------------+---------+
	\| grad_norm \| 4.4668 \|
	+---------------+---------+
	\| learning_rate \| 6.8e-05 \|
	+---------------+---------+
	\| epoch \| 1.29714 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:18:43,681 - INFO -
	[36m🚂 Training Metrics (Step 350) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.5643 \|
	+---------------+---------+
	\| grad_norm \| 2.15029 \|
	+---------------+---------+
	\| learning_rate \| 7e-05 \|
	+---------------+---------+
	\| epoch \| 1.33524 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:19:04,375 - INFO -
	[36m🚂 Training Metrics (Step 360) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.6392 \|
	+---------------+---------+
	\| grad_norm \| 4.94501 \|
	+---------------+---------+
	\| learning_rate \| 7.2e-05 \|
	+---------------+---------+
	\| epoch \| 1.37333 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:19:25,066 - INFO -
	[36m🚂 Training Metrics (Step 370) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.6347 \|
	+---------------+---------+
	\| grad_norm \| 1.98727 \|
	+---------------+---------+
	\| learning_rate \| 7.4e-05 \|
	+---------------+---------+
	\| epoch \| 1.41143 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:19:45,772 - INFO -
	[36m🚂 Training Metrics (Step 380) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.8543 \|
	+---------------+---------+
	\| grad_norm \| 10.4538 \|
	+---------------+---------+
	\| learning_rate \| 7.6e-05 \|
	+---------------+---------+
	\| epoch \| 1.44952 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:20:06,478 - INFO -
	[36m🚂 Training Metrics (Step 390) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.9327 \|
	+---------------+---------+
	\| grad_norm \| 2.76192 \|
	+---------------+---------+
	\| learning_rate \| 7.8e-05 \|
	+---------------+---------+
	\| epoch \| 1.48762 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:20:27,183 - INFO -
	[36m🚂 Training Metrics (Step 400) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.8006 \|
	+---------------+---------+
	\| grad_norm \| 4.33592 \|
	+---------------+---------+
	\| learning_rate \| 8e-05 \|
	+---------------+---------+
	\| epoch \| 1.52571 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:20:47,886 - INFO -
	[36m🚂 Training Metrics (Step 410) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.7399 \|
	+---------------+---------+
	\| grad_norm \| 5.77753 \|
	+---------------+---------+
	\| learning_rate \| 8.2e-05 \|
	+---------------+---------+
	\| epoch \| 1.56381 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:21:08,595 - INFO -
	[36m🚂 Training Metrics (Step 420) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.8422 \|
	+---------------+---------+
	\| grad_norm \| 3.92247 \|
	+---------------+---------+
	\| learning_rate \| 8.4e-05 \|
	+---------------+---------+
	\| epoch \| 1.6019 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:21:29,301 - INFO -
	[36m🚂 Training Metrics (Step 430) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.2485 \|
	+---------------+---------+
	\| grad_norm \| 4.74773 \|
	+---------------+---------+
	\| learning_rate \| 8.5e-05 \|
	+---------------+---------+
	\| epoch \| 1.64 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:21:49,992 - INFO -
	[36m🚂 Training Metrics (Step 440) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.9139 \|
	+---------------+---------+
	\| grad_norm \| 3.45202 \|
	+---------------+---------+
	\| learning_rate \| 8.7e-05 \|
	+---------------+---------+
	\| epoch \| 1.67809 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:22:10,679 - INFO -
	[36m🚂 Training Metrics (Step 450) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.6702 \|
	+---------------+---------+
	\| grad_norm \| 3.90906 \|
	+---------------+---------+
	\| learning_rate \| 8.9e-05 \|
	+---------------+---------+
	\| epoch \| 1.71619 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:22:31,373 - INFO -
	[36m🚂 Training Metrics (Step 460) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.8502 \|
	+---------------+---------+
	\| grad_norm \| 3.40761 \|
	+---------------+---------+
	\| learning_rate \| 9.1e-05 \|
	+---------------+---------+
	\| epoch \| 1.75429 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:22:52,068 - INFO -
	[36m🚂 Training Metrics (Step 470) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.6422 \|
	+---------------+---------+
	\| grad_norm \| 4.53801 \|
	+---------------+---------+
	\| learning_rate \| 9.3e-05 \|
	+---------------+---------+
	\| epoch \| 1.79238 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:23:12,780 - INFO -
	[36m🚂 Training Metrics (Step 480) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.9417 \|
	+---------------+---------+
	\| grad_norm \| 2.15287 \|
	+---------------+---------+
	\| learning_rate \| 9.5e-05 \|
	+---------------+---------+
	\| epoch \| 1.83048 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:23:33,479 - INFO -
	[36m🚂 Training Metrics (Step 490) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.9854 \|
	+---------------+---------+
	\| grad_norm \| 3.23702 \|
	+---------------+---------+
	\| learning_rate \| 9.7e-05 \|
	+---------------+---------+
	\| epoch \| 1.86857 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:23:54,194 - INFO -
	[36m🚂 Training Metrics (Step 500) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.825 \|
	+---------------+---------+
	\| grad_norm \| 2.25583 \|
	+---------------+---------+
	\| learning_rate \| 9.9e-05 \|
	+---------------+---------+
	\| epoch \| 1.90667 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:24:14,880 - INFO -
	[36m🚂 Training Metrics (Step 510) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.0095 \|
	+---------------+---------+
	\| grad_norm \| 4.32831 \|
	+---------------+---------+
	\| learning_rate \| 0.0001 \|
	+---------------+---------+
	\| epoch \| 1.94476 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:24:35,542 - INFO -
	[36m🚂 Training Metrics (Step 520) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.8745 \|
	+---------------+---------+
	\| grad_norm \| 10.9455 \|
	+---------------+---------+
	\| learning_rate \| 0.0001 \|
	+---------------+---------+
	\| epoch \| 1.98286 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:24:44,336 - INFO - Removing 'token_type_ids' from eval_dataset as they are not needed. - [multilabel_classify.py:2376:evaluate]
	2025-06-16 22:43:33,519 - INFO -
	[33m🔍 Evaluation Metrics 🔍
	+-------------------------------+----------+
	\| Metric \| Value \|
	+===============================+==========+
	\| eval_f1_micro \| 0.007046 \|
	+-------------------------------+----------+
	\| eval_f1_macro \| 0.006161 \|
	+-------------------------------+----------+
	\| eval_precision_at_5 \| 0.115348 \|
	+-------------------------------+----------+
	\| eval_recall_at_5 \| 0.031091 \|
	+-------------------------------+----------+
	\| eval_precision_at_8 \| 0.107892 \|
	+-------------------------------+----------+
	\| eval_recall_at_8 \| 0.045557 \|
	+-------------------------------+----------+
	\| eval_precision_at_15 \| 0.093328 \|
	+-------------------------------+----------+
	\| eval_recall_at_15 \| 0.072304 \|
	+-------------------------------+----------+
	\| eval_rare_f1_micro \| 0.004353 \|
	+-------------------------------+----------+
	\| eval_rare_f1_macro \| 0.004122 \|
	+-------------------------------+----------+
	\| eval_rare_precision \| 0.002182 \|
	+-------------------------------+----------+
	\| eval_rare_recall \| 0.868453 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_5 \| 0.039082 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_5 \| 0.015531 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_8 \| 0.033327 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_8 \| 0.020969 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_15 \| 0.028138 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_15 \| 0.032345 \|
	+-------------------------------+----------+
	\| eval_not_rare_f1_micro \| 0.139875 \|
	+-------------------------------+----------+
	\| eval_not_rare_f1_macro \| 0.133686 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision \| 0.07536 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall \| 0.971989 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_5 \| 0.173497 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_5 \| 0.110951 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_8 \| 0.15442 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_8 \| 0.155294 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_15 \| 0.140005 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_15 \| 0.255021 \|
	+-------------------------------+----------+
	\| eval_loss \| -2.29713 \|
	+-------------------------------+----------+[0m - [multilabel_classify.py:2231:on_evaluate]
	2025-06-16 22:43:36,565 - INFO - 💾 Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-524 - [multilabel_classify.py:2469:_save]
	2025-06-16 22:43:36,567 - INFO - ⚙️ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-524 - [multilabel_classify.py:2474:_save]
	2025-06-16 22:43:36,568 - INFO - 📋 Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-524:
	+---------+--------------------+------------+
	\| Index \| Saved File \| Size \|
	+=========+====================+============+
	\| 1 \| training_args.bin \| 0.01 MB \|
	+---------+--------------------+------------+
	\| 2 \| optimizer.pt \| 1308.77 MB \|
	+---------+--------------------+------------+
	\| 3 \| model.safetensors \| 4600.97 MB \|
	+---------+--------------------+------------+
	\| 4 \| scaler.pt \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 5 \| config.json \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 6 \| scheduler.pt \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 7 \| trainer_state.json \| 0.01 MB \|
	+---------+--------------------+------------+
	\| 8 \| rng_state.pth \| 0.01 MB \|
	+---------+--------------------+------------+ - [multilabel_classify.py:2491:_save]
	2025-06-16 22:43:53,283 - INFO -
	[36m🚂 Training Metrics (Step 530) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.1318 \|
	+---------------+---------+
	\| grad_norm \| 6.27872 \|
	+---------------+---------+
	\| learning_rate \| 0.0001 \|
	+---------------+---------+
	\| epoch \| 2.02286 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:44:13,913 - INFO -
	[36m🚂 Training Metrics (Step 540) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.9013 \|
	+---------------+---------+
	\| grad_norm \| 5.13338 \|
	+---------------+---------+
	\| learning_rate \| 9.9e-05 \|
	+---------------+---------+
	\| epoch \| 2.06095 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:44:34,550 - INFO -
	[36m🚂 Training Metrics (Step 550) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.9378 \|
	+---------------+---------+
	\| grad_norm \| 2.31937 \|
	+---------------+---------+
	\| learning_rate \| 9.9e-05 \|
	+---------------+---------+
	\| epoch \| 2.09905 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:44:55,220 - INFO -
	[36m🚂 Training Metrics (Step 560) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.1557 \|
	+---------------+---------+
	\| grad_norm \| 2.88556 \|
	+---------------+---------+
	\| learning_rate \| 9.9e-05 \|
	+---------------+---------+
	\| epoch \| 2.13714 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:45:15,871 - INFO -
	[36m🚂 Training Metrics (Step 570) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.1589 \|
	+---------------+---------+
	\| grad_norm \| 16.5141 \|
	+---------------+---------+
	\| learning_rate \| 9.8e-05 \|
	+---------------+---------+
	\| epoch \| 2.17524 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:45:36,535 - INFO -
	[36m🚂 Training Metrics (Step 580) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.0709 \|
	+---------------+---------+
	\| grad_norm \| 20.6117 \|
	+---------------+---------+
	\| learning_rate \| 9.8e-05 \|
	+---------------+---------+
	\| epoch \| 2.21333 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:45:57,196 - INFO -
	[36m🚂 Training Metrics (Step 590) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.0031 \|
	+---------------+---------+
	\| grad_norm \| 3.41571 \|
	+---------------+---------+
	\| learning_rate \| 9.7e-05 \|
	+---------------+---------+
	\| epoch \| 2.25143 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:46:17,859 - INFO -
	[36m🚂 Training Metrics (Step 600) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.0642 \|
	+---------------+---------+
	\| grad_norm \| 3.67429 \|
	+---------------+---------+
	\| learning_rate \| 9.7e-05 \|
	+---------------+---------+
	\| epoch \| 2.28952 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:46:38,540 - INFO -
	[36m🚂 Training Metrics (Step 610) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.8556 \|
	+---------------+---------+
	\| grad_norm \| 3.29057 \|
	+---------------+---------+
	\| learning_rate \| 9.6e-05 \|
	+---------------+---------+
	\| epoch \| 2.32762 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:46:59,219 - INFO -
	[36m🚂 Training Metrics (Step 620) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.0252 \|
	+---------------+---------+
	\| grad_norm \| 4.15559 \|
	+---------------+---------+
	\| learning_rate \| 9.5e-05 \|
	+---------------+---------+
	\| epoch \| 2.36571 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:47:19,873 - INFO -
	[36m🚂 Training Metrics (Step 630) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.043 \|
	+---------------+---------+
	\| grad_norm \| 7.9306 \|
	+---------------+---------+
	\| learning_rate \| 9.4e-05 \|
	+---------------+---------+
	\| epoch \| 2.40381 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:47:40,534 - INFO -
	[36m🚂 Training Metrics (Step 640) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.9925 \|
	+---------------+---------+
	\| grad_norm \| 5.63441 \|
	+---------------+---------+
	\| learning_rate \| 9.3e-05 \|
	+---------------+---------+
	\| epoch \| 2.44191 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:48:01,208 - INFO -
	[36m🚂 Training Metrics (Step 650) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.1192 \|
	+---------------+---------+
	\| grad_norm \| 6.16559 \|
	+---------------+---------+
	\| learning_rate \| 9.2e-05 \|
	+---------------+---------+
	\| epoch \| 2.48 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:48:21,875 - INFO -
	[36m🚂 Training Metrics (Step 660) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.7578 \|
	+---------------+---------+
	\| grad_norm \| 7.27245 \|
	+---------------+---------+
	\| learning_rate \| 9.1e-05 \|
	+---------------+---------+
	\| epoch \| 2.5181 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:48:42,549 - INFO -
	[36m🚂 Training Metrics (Step 670) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.0685 \|
	+---------------+---------+
	\| grad_norm \| 4.86883 \|
	+---------------+---------+
	\| learning_rate \| 9e-05 \|
	+---------------+---------+
	\| epoch \| 2.55619 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:49:03,221 - INFO -
	[36m🚂 Training Metrics (Step 680) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.3125 \|
	+---------------+---------+
	\| grad_norm \| 4.60443 \|
	+---------------+---------+
	\| learning_rate \| 8.9e-05 \|
	+---------------+---------+
	\| epoch \| 2.59429 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:49:23,892 - INFO -
	[36m🚂 Training Metrics (Step 690) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.9993 \|
	+---------------+---------+
	\| grad_norm \| 5.1602 \|
	+---------------+---------+
	\| learning_rate \| 8.8e-05 \|
	+---------------+---------+
	\| epoch \| 2.63238 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:49:44,576 - INFO -
	[36m🚂 Training Metrics (Step 700) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -2.9074 \|
	+---------------+---------+
	\| grad_norm \| 3.71175 \|
	+---------------+---------+
	\| learning_rate \| 8.6e-05 \|
	+---------------+---------+
	\| epoch \| 2.67048 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:50:05,236 - INFO -
	[36m🚂 Training Metrics (Step 710) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.248 \|
	+---------------+---------+
	\| grad_norm \| 5.70862 \|
	+---------------+---------+
	\| learning_rate \| 8.5e-05 \|
	+---------------+---------+
	\| epoch \| 2.70857 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:50:25,900 - INFO -
	[36m🚂 Training Metrics (Step 720) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.1012 \|
	+---------------+---------+
	\| grad_norm \| 3.30394 \|
	+---------------+---------+
	\| learning_rate \| 8.3e-05 \|
	+---------------+---------+
	\| epoch \| 2.74667 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:50:46,563 - INFO -
	[36m🚂 Training Metrics (Step 730) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.2892 \|
	+---------------+---------+
	\| grad_norm \| 4.57689 \|
	+---------------+---------+
	\| learning_rate \| 8.2e-05 \|
	+---------------+---------+
	\| epoch \| 2.78476 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:51:07,213 - INFO -
	[36m🚂 Training Metrics (Step 740) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.0007 \|
	+---------------+---------+
	\| grad_norm \| 4.63606 \|
	+---------------+---------+
	\| learning_rate \| 8.1e-05 \|
	+---------------+---------+
	\| epoch \| 2.82286 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:51:27,880 - INFO -
	[36m🚂 Training Metrics (Step 750) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.0416 \|
	+---------------+---------+
	\| grad_norm \| 6.01303 \|
	+---------------+---------+
	\| learning_rate \| 7.9e-05 \|
	+---------------+---------+
	\| epoch \| 2.86095 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:51:48,533 - INFO -
	[36m🚂 Training Metrics (Step 760) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.2314 \|
	+---------------+---------+
	\| grad_norm \| 3.14631 \|
	+---------------+---------+
	\| learning_rate \| 7.7e-05 \|
	+---------------+---------+
	\| epoch \| 2.89905 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:52:09,199 - INFO -
	[36m🚂 Training Metrics (Step 770) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.1514 \|
	+---------------+---------+
	\| grad_norm \| 3.72293 \|
	+---------------+---------+
	\| learning_rate \| 7.6e-05 \|
	+---------------+---------+
	\| epoch \| 2.93714 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:52:29,843 - INFO -
	[36m🚂 Training Metrics (Step 780) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.0665 \|
	+---------------+---------+
	\| grad_norm \| 6.07238 \|
	+---------------+---------+
	\| learning_rate \| 7.4e-05 \|
	+---------------+---------+
	\| epoch \| 2.97524 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 22:52:42,757 - INFO - Removing 'token_type_ids' from eval_dataset as they are not needed. - [multilabel_classify.py:2376:evaluate]
	2025-06-16 23:11:26,833 - INFO -
	[33m🔍 Evaluation Metrics 🔍
	+-------------------------------+----------+
	\| Metric \| Value \|
	+===============================+==========+
	\| eval_f1_micro \| 0.006362 \|
	+-------------------------------+----------+
	\| eval_f1_macro \| 0.006038 \|
	+-------------------------------+----------+
	\| eval_precision_at_5 \| 0.052532 \|
	+-------------------------------+----------+
	\| eval_recall_at_5 \| 0.014819 \|
	+-------------------------------+----------+
	\| eval_precision_at_8 \| 0.045045 \|
	+-------------------------------+----------+
	\| eval_recall_at_8 \| 0.020262 \|
	+-------------------------------+----------+
	\| eval_precision_at_15 \| 0.039214 \|
	+-------------------------------+----------+
	\| eval_recall_at_15 \| 0.030945 \|
	+-------------------------------+----------+
	\| eval_rare_f1_micro \| 0.004069 \|
	+-------------------------------+----------+
	\| eval_rare_f1_macro \| 0.004018 \|
	+-------------------------------+----------+
	\| eval_rare_precision \| 0.002039 \|
	+-------------------------------+----------+
	\| eval_rare_recall \| 0.968818 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_5 \| 0.015032 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_5 \| 0.006142 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_8 \| 0.0134 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_8 \| 0.008599 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_15 \| 0.010707 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_15 \| 0.012881 \|
	+-------------------------------+----------+
	\| eval_not_rare_f1_micro \| 0.137554 \|
	+-------------------------------+----------+
	\| eval_not_rare_f1_macro \| 0.132349 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision \| 0.073945 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall \| 0.983969 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_5 \| 0.149842 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_5 \| 0.09499 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_8 \| 0.123616 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_8 \| 0.124506 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_15 \| 0.114662 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_15 \| 0.204124 \|
	+-------------------------------+----------+
	\| eval_loss \| -2.32242 \|
	+-------------------------------+----------+[0m - [multilabel_classify.py:2231:on_evaluate]
	2025-06-16 23:11:30,098 - INFO - 💾 Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-786 - [multilabel_classify.py:2469:_save]
	2025-06-16 23:11:30,100 - INFO - ⚙️ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-786 - [multilabel_classify.py:2474:_save]
	2025-06-16 23:11:30,101 - INFO - 📋 Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-786:
	+---------+--------------------+------------+
	\| Index \| Saved File \| Size \|
	+=========+====================+============+
	\| 1 \| training_args.bin \| 0.01 MB \|
	+---------+--------------------+------------+
	\| 2 \| optimizer.pt \| 1308.77 MB \|
	+---------+--------------------+------------+
	\| 3 \| model.safetensors \| 4600.97 MB \|
	+---------+--------------------+------------+
	\| 4 \| scaler.pt \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 5 \| config.json \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 6 \| scheduler.pt \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 7 \| trainer_state.json \| 0.02 MB \|
	+---------+--------------------+------------+
	\| 8 \| rng_state.pth \| 0.01 MB \|
	+---------+--------------------+------------+ - [multilabel_classify.py:2491:_save]
	2025-06-16 23:11:42,691 - INFO -
	[36m🚂 Training Metrics (Step 790) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.2797 \|
	+---------------+---------+
	\| grad_norm \| 3.42903 \|
	+---------------+---------+
	\| learning_rate \| 7.2e-05 \|
	+---------------+---------+
	\| epoch \| 3.01524 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:12:03,473 - INFO -
	[36m🚂 Training Metrics (Step 800) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.1285 \|
	+---------------+---------+
	\| grad_norm \| 3.36507 \|
	+---------------+---------+
	\| learning_rate \| 7.1e-05 \|
	+---------------+---------+
	\| epoch \| 3.05333 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:12:24,097 - INFO -
	[36m🚂 Training Metrics (Step 810) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.3476 \|
	+---------------+---------+
	\| grad_norm \| 12.2284 \|
	+---------------+---------+
	\| learning_rate \| 6.9e-05 \|
	+---------------+---------+
	\| epoch \| 3.09143 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:12:44,741 - INFO -
	[36m🚂 Training Metrics (Step 820) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.2807 \|
	+---------------+---------+
	\| grad_norm \| 4.27001 \|
	+---------------+---------+
	\| learning_rate \| 6.7e-05 \|
	+---------------+---------+
	\| epoch \| 3.12952 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:13:05,412 - INFO -
	[36m🚂 Training Metrics (Step 830) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.2203 \|
	+---------------+---------+
	\| grad_norm \| 7.73442 \|
	+---------------+---------+
	\| learning_rate \| 6.5e-05 \|
	+---------------+---------+
	\| epoch \| 3.16762 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:13:26,068 - INFO -
	[36m🚂 Training Metrics (Step 840) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.4928 \|
	+---------------+---------+
	\| grad_norm \| 5.21309 \|
	+---------------+---------+
	\| learning_rate \| 6.3e-05 \|
	+---------------+---------+
	\| epoch \| 3.20571 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:13:46,711 - INFO -
	[36m🚂 Training Metrics (Step 850) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.3565 \|
	+---------------+---------+
	\| grad_norm \| 9.77577 \|
	+---------------+---------+
	\| learning_rate \| 6.2e-05 \|
	+---------------+---------+
	\| epoch \| 3.24381 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:14:07,376 - INFO -
	[36m🚂 Training Metrics (Step 860) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.3151 \|
	+---------------+---------+
	\| grad_norm \| 5.86498 \|
	+---------------+---------+
	\| learning_rate \| 6e-05 \|
	+---------------+---------+
	\| epoch \| 3.28191 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:14:28,035 - INFO -
	[36m🚂 Training Metrics (Step 870) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.3922 \|
	+---------------+---------+
	\| grad_norm \| 6.25596 \|
	+---------------+---------+
	\| learning_rate \| 5.8e-05 \|
	+---------------+---------+
	\| epoch \| 3.32 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:14:48,708 - INFO -
	[36m🚂 Training Metrics (Step 880) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.4076 \|
	+---------------+---------+
	\| grad_norm \| 6.42964 \|
	+---------------+---------+
	\| learning_rate \| 5.6e-05 \|
	+---------------+---------+
	\| epoch \| 3.3581 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:15:09,378 - INFO -
	[36m🚂 Training Metrics (Step 890) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6344 \|
	+---------------+---------+
	\| grad_norm \| 6.22911 \|
	+---------------+---------+
	\| learning_rate \| 5.4e-05 \|
	+---------------+---------+
	\| epoch \| 3.39619 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:15:30,073 - INFO -
	[36m🚂 Training Metrics (Step 900) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.4873 \|
	+---------------+---------+
	\| grad_norm \| 3.87399 \|
	+---------------+---------+
	\| learning_rate \| 5.2e-05 \|
	+---------------+---------+
	\| epoch \| 3.43429 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:15:50,754 - INFO -
	[36m🚂 Training Metrics (Step 910) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.309 \|
	+---------------+---------+
	\| grad_norm \| 4.9241 \|
	+---------------+---------+
	\| learning_rate \| 5e-05 \|
	+---------------+---------+
	\| epoch \| 3.47238 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:16:11,415 - INFO -
	[36m🚂 Training Metrics (Step 920) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.3892 \|
	+---------------+---------+
	\| grad_norm \| 6.75714 \|
	+---------------+---------+
	\| learning_rate \| 4.8e-05 \|
	+---------------+---------+
	\| epoch \| 3.51048 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:16:32,086 - INFO -
	[36m🚂 Training Metrics (Step 930) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6367 \|
	+---------------+---------+
	\| grad_norm \| 6.01969 \|
	+---------------+---------+
	\| learning_rate \| 4.6e-05 \|
	+---------------+---------+
	\| epoch \| 3.54857 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:16:52,762 - INFO -
	[36m🚂 Training Metrics (Step 940) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6291 \|
	+---------------+---------+
	\| grad_norm \| 10.1146 \|
	+---------------+---------+
	\| learning_rate \| 4.4e-05 \|
	+---------------+---------+
	\| epoch \| 3.58667 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:17:13,420 - INFO -
	[36m🚂 Training Metrics (Step 950) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.5316 \|
	+---------------+---------+
	\| grad_norm \| 7.94565 \|
	+---------------+---------+
	\| learning_rate \| 4.2e-05 \|
	+---------------+---------+
	\| epoch \| 3.62476 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:17:34,093 - INFO -
	[36m🚂 Training Metrics (Step 960) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.3274 \|
	+---------------+---------+
	\| grad_norm \| 4.13957 \|
	+---------------+---------+
	\| learning_rate \| 4.1e-05 \|
	+---------------+---------+
	\| epoch \| 3.66286 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:17:54,762 - INFO -
	[36m🚂 Training Metrics (Step 970) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.3817 \|
	+---------------+---------+
	\| grad_norm \| 7.41069 \|
	+---------------+---------+
	\| learning_rate \| 3.9e-05 \|
	+---------------+---------+
	\| epoch \| 3.70095 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:18:15,439 - INFO -
	[36m🚂 Training Metrics (Step 980) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.7929 \|
	+---------------+---------+
	\| grad_norm \| 6.45495 \|
	+---------------+---------+
	\| learning_rate \| 3.7e-05 \|
	+---------------+---------+
	\| epoch \| 3.73905 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:18:36,120 - INFO -
	[36m🚂 Training Metrics (Step 990) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6203 \|
	+---------------+---------+
	\| grad_norm \| 10.8201 \|
	+---------------+---------+
	\| learning_rate \| 3.5e-05 \|
	+---------------+---------+
	\| epoch \| 3.77714 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:18:56,774 - INFO -
	[36m🚂 Training Metrics (Step 1000) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.5213 \|
	+---------------+---------+
	\| grad_norm \| 3.94306 \|
	+---------------+---------+
	\| learning_rate \| 3.3e-05 \|
	+---------------+---------+
	\| epoch \| 3.81524 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:19:17,405 - INFO -
	[36m🚂 Training Metrics (Step 1010) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6218 \|
	+---------------+---------+
	\| grad_norm \| 7.93971 \|
	+---------------+---------+
	\| learning_rate \| 3.2e-05 \|
	+---------------+---------+
	\| epoch \| 3.85333 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:19:38,038 - INFO -
	[36m🚂 Training Metrics (Step 1020) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.5477 \|
	+---------------+---------+
	\| grad_norm \| 13.3498 \|
	+---------------+---------+
	\| learning_rate \| 3e-05 \|
	+---------------+---------+
	\| epoch \| 3.89143 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:19:58,698 - INFO -
	[36m🚂 Training Metrics (Step 1030) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6429 \|
	+---------------+---------+
	\| grad_norm \| 10.1506 \|
	+---------------+---------+
	\| learning_rate \| 2.8e-05 \|
	+---------------+---------+
	\| epoch \| 3.92952 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:20:19,330 - INFO -
	[36m🚂 Training Metrics (Step 1040) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.5627 \|
	+---------------+---------+
	\| grad_norm \| 4.49416 \|
	+---------------+---------+
	\| learning_rate \| 2.6e-05 \|
	+---------------+---------+
	\| epoch \| 3.96762 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:20:36,375 - INFO - Removing 'token_type_ids' from eval_dataset as they are not needed. - [multilabel_classify.py:2376:evaluate]
	2025-06-16 23:39:17,428 - INFO -
	[33m🔍 Evaluation Metrics 🔍
	+-------------------------------+----------+
	\| Metric \| Value \|
	+===============================+==========+
	\| eval_f1_micro \| 0.006241 \|
	+-------------------------------+----------+
	\| eval_f1_macro \| 0.00597 \|
	+-------------------------------+----------+
	\| eval_precision_at_5 \| 0.018196 \|
	+-------------------------------+----------+
	\| eval_recall_at_5 \| 0.005862 \|
	+-------------------------------+----------+
	\| eval_precision_at_8 \| 0.01518 \|
	+-------------------------------+----------+
	\| eval_recall_at_8 \| 0.007524 \|
	+-------------------------------+----------+
	\| eval_precision_at_15 \| 0.016297 \|
	+-------------------------------+----------+
	\| eval_recall_at_15 \| 0.013505 \|
	+-------------------------------+----------+
	\| eval_rare_f1_micro \| 0.004005 \|
	+-------------------------------+----------+
	\| eval_rare_f1_macro \| 0.003967 \|
	+-------------------------------+----------+
	\| eval_rare_precision \| 0.002007 \|
	+-------------------------------+----------+
	\| eval_rare_recall \| 0.992014 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_5 \| 0.006883 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_5 \| 0.00311 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_8 \| 0.005241 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_8 \| 0.003914 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_15 \| 0.004404 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_15 \| 0.006155 \|
	+-------------------------------+----------+
	\| eval_not_rare_f1_micro \| 0.136059 \|
	+-------------------------------+----------+
	\| eval_not_rare_f1_macro \| 0.131255 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision \| 0.073009 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall \| 0.997299 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_5 \| 0.139399 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_5 \| 0.085527 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_8 \| 0.109276 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_8 \| 0.105525 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_15 \| 0.102189 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_15 \| 0.175595 \|
	+-------------------------------+----------+
	\| eval_loss \| -2.32388 \|
	+-------------------------------+----------+[0m - [multilabel_classify.py:2231:on_evaluate]
	2025-06-16 23:39:21,151 - INFO - 💾 Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1048 - [multilabel_classify.py:2469:_save]
	2025-06-16 23:39:21,153 - INFO - ⚙️ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1048 - [multilabel_classify.py:2474:_save]
	2025-06-16 23:39:21,154 - INFO - 📋 Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1048:
	+---------+--------------------+------------+
	\| Index \| Saved File \| Size \|
	+=========+====================+============+
	\| 1 \| training_args.bin \| 0.01 MB \|
	+---------+--------------------+------------+
	\| 2 \| optimizer.pt \| 1308.77 MB \|
	+---------+--------------------+------------+
	\| 3 \| model.safetensors \| 4600.97 MB \|
	+---------+--------------------+------------+
	\| 4 \| scaler.pt \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 5 \| config.json \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 6 \| scheduler.pt \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 7 \| trainer_state.json \| 0.02 MB \|
	+---------+--------------------+------------+
	\| 8 \| rng_state.pth \| 0.01 MB \|
	+---------+--------------------+------------+ - [multilabel_classify.py:2491:_save]
	2025-06-16 23:39:29,580 - INFO -
	[36m🚂 Training Metrics (Step 1050) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.4879 \|
	+---------------+---------+
	\| grad_norm \| 6.39619 \|
	+---------------+---------+
	\| learning_rate \| 2.5e-05 \|
	+---------------+---------+
	\| epoch \| 4.00762 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:39:50,155 - INFO -
	[36m🚂 Training Metrics (Step 1060) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.9362 \|
	+---------------+---------+
	\| grad_norm \| 10.1241 \|
	+---------------+---------+
	\| learning_rate \| 2.3e-05 \|
	+---------------+---------+
	\| epoch \| 4.04571 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:40:10,763 - INFO -
	[36m🚂 Training Metrics (Step 1070) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6064 \|
	+---------------+---------+
	\| grad_norm \| 9.74406 \|
	+---------------+---------+
	\| learning_rate \| 2.2e-05 \|
	+---------------+---------+
	\| epoch \| 4.08381 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:40:31,375 - INFO -
	[36m🚂 Training Metrics (Step 1080) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6683 \|
	+---------------+---------+
	\| grad_norm \| 3.95963 \|
	+---------------+---------+
	\| learning_rate \| 2e-05 \|
	+---------------+---------+
	\| epoch \| 4.1219 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:40:52,017 - INFO -
	[36m🚂 Training Metrics (Step 1090) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.8221 \|
	+---------------+---------+
	\| grad_norm \| 7.74502 \|
	+---------------+---------+
	\| learning_rate \| 1.9e-05 \|
	+---------------+---------+
	\| epoch \| 4.16 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:41:12,654 - INFO -
	[36m🚂 Training Metrics (Step 1100) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6546 \|
	+---------------+---------+
	\| grad_norm \| 4.37811 \|
	+---------------+---------+
	\| learning_rate \| 1.7e-05 \|
	+---------------+---------+
	\| epoch \| 4.1981 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:41:33,298 - INFO -
	[36m🚂 Training Metrics (Step 1110) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.8077 \|
	+---------------+---------+
	\| grad_norm \| 6.02418 \|
	+---------------+---------+
	\| learning_rate \| 1.6e-05 \|
	+---------------+---------+
	\| epoch \| 4.23619 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:41:54,124 - INFO -
	[36m🚂 Training Metrics (Step 1120) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.9832 \|
	+---------------+---------+
	\| grad_norm \| 11.386 \|
	+---------------+---------+
	\| learning_rate \| 1.4e-05 \|
	+---------------+---------+
	\| epoch \| 4.27429 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:42:14,776 - INFO -
	[36m🚂 Training Metrics (Step 1130) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.8361 \|
	+---------------+---------+
	\| grad_norm \| 6.99711 \|
	+---------------+---------+
	\| learning_rate \| 1.3e-05 \|
	+---------------+---------+
	\| epoch \| 4.31238 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:42:35,401 - INFO -
	[36m🚂 Training Metrics (Step 1140) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.9406 \|
	+---------------+---------+
	\| grad_norm \| 11.1791 \|
	+---------------+---------+
	\| learning_rate \| 1.2e-05 \|
	+---------------+---------+
	\| epoch \| 4.35048 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:42:56,059 - INFO -
	[36m🚂 Training Metrics (Step 1150) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.8839 \|
	+---------------+---------+
	\| grad_norm \| 5.22412 \|
	+---------------+---------+
	\| learning_rate \| 1.1e-05 \|
	+---------------+---------+
	\| epoch \| 4.38857 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:43:16,701 - INFO -
	[36m🚂 Training Metrics (Step 1160) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.9782 \|
	+---------------+---------+
	\| grad_norm \| 5.76971 \|
	+---------------+---------+
	\| learning_rate \| 1e-05 \|
	+---------------+---------+
	\| epoch \| 4.42667 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:43:37,361 - INFO -
	[36m🚂 Training Metrics (Step 1170) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.7407 \|
	+---------------+---------+
	\| grad_norm \| 13.5051 \|
	+---------------+---------+
	\| learning_rate \| 9e-06 \|
	+---------------+---------+
	\| epoch \| 4.46476 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:43:58,001 - INFO -
	[36m🚂 Training Metrics (Step 1180) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6642 \|
	+---------------+---------+
	\| grad_norm \| 5.81691 \|
	+---------------+---------+
	\| learning_rate \| 8e-06 \|
	+---------------+---------+
	\| epoch \| 4.50286 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:44:18,651 - INFO -
	[36m🚂 Training Metrics (Step 1190) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.8646 \|
	+---------------+---------+
	\| grad_norm \| 5.51144 \|
	+---------------+---------+
	\| learning_rate \| 7e-06 \|
	+---------------+---------+
	\| epoch \| 4.54095 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:44:39,299 - INFO -
	[36m🚂 Training Metrics (Step 1200) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -4.1574 \|
	+---------------+---------+
	\| grad_norm \| 7.77753 \|
	+---------------+---------+
	\| learning_rate \| 6e-06 \|
	+---------------+---------+
	\| epoch \| 4.57905 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:44:59,944 - INFO -
	[36m🚂 Training Metrics (Step 1210) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.8684 \|
	+---------------+---------+
	\| grad_norm \| 9.38395 \|
	+---------------+---------+
	\| learning_rate \| 5e-06 \|
	+---------------+---------+
	\| epoch \| 4.61714 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:45:20,611 - INFO -
	[36m🚂 Training Metrics (Step 1220) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6834 \|
	+---------------+---------+
	\| grad_norm \| 7.10574 \|
	+---------------+---------+
	\| learning_rate \| 4e-06 \|
	+---------------+---------+
	\| epoch \| 4.65524 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:45:41,282 - INFO -
	[36m🚂 Training Metrics (Step 1230) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.9973 \|
	+---------------+---------+
	\| grad_norm \| 3.95851 \|
	+---------------+---------+
	\| learning_rate \| 4e-06 \|
	+---------------+---------+
	\| epoch \| 4.69333 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:46:01,971 - INFO -
	[36m🚂 Training Metrics (Step 1240) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.747 \|
	+---------------+---------+
	\| grad_norm \| 5.98381 \|
	+---------------+---------+
	\| learning_rate \| 3e-06 \|
	+---------------+---------+
	\| epoch \| 4.73143 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:46:22,631 - INFO -
	[36m🚂 Training Metrics (Step 1250) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.8056 \|
	+---------------+---------+
	\| grad_norm \| 6.93498 \|
	+---------------+---------+
	\| learning_rate \| 3e-06 \|
	+---------------+---------+
	\| epoch \| 4.76952 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:46:43,296 - INFO -
	[36m🚂 Training Metrics (Step 1260) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.7874 \|
	+---------------+---------+
	\| grad_norm \| 3.92144 \|
	+---------------+---------+
	\| learning_rate \| 2e-06 \|
	+---------------+---------+
	\| epoch \| 4.80762 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:47:03,970 - INFO -
	[36m🚂 Training Metrics (Step 1270) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.895 \|
	+---------------+---------+
	\| grad_norm \| 7.15651 \|
	+---------------+---------+
	\| learning_rate \| 2e-06 \|
	+---------------+---------+
	\| epoch \| 4.84571 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:47:24,621 - INFO -
	[36m🚂 Training Metrics (Step 1280) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6054 \|
	+---------------+---------+
	\| grad_norm \| 14.3564 \|
	+---------------+---------+
	\| learning_rate \| 1e-06 \|
	+---------------+---------+
	\| epoch \| 4.88381 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:47:45,288 - INFO -
	[36m🚂 Training Metrics (Step 1290) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.6814 \|
	+---------------+---------+
	\| grad_norm \| 8.71897 \|
	+---------------+---------+
	\| learning_rate \| 1e-06 \|
	+---------------+---------+
	\| epoch \| 4.9219 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:48:05,952 - INFO -
	[36m🚂 Training Metrics (Step 1300) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -3.7682 \|
	+---------------+---------+
	\| grad_norm \| 8.88537 \|
	+---------------+---------+
	\| learning_rate \| 1e-06 \|
	+---------------+---------+
	\| epoch \| 4.96 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:48:26,621 - INFO -
	[36m🚂 Training Metrics (Step 1310) 🚂
	+---------------+---------+
	\| Metric \| Value \|
	+===============+=========+
	\| loss \| -4.0526 \|
	+---------------+---------+
	\| grad_norm \| 8.41898 \|
	+---------------+---------+
	\| learning_rate \| 1e-06 \|
	+---------------+---------+
	\| epoch \| 4.9981 \|
	+---------------+---------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-16 23:48:27,962 - INFO - 💾 Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310 - [multilabel_classify.py:2469:_save]
	2025-06-16 23:48:27,964 - INFO - ⚙️ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310 - [multilabel_classify.py:2474:_save]
	2025-06-16 23:48:27,965 - INFO - 📋 Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310:
	+---------+-------------------+------------+
	\| Index \| Saved File \| Size \|
	+=========+===================+============+
	\| 1 \| training_args.bin \| 0.01 MB \|
	+---------+-------------------+------------+
	\| 2 \| model.safetensors \| 4600.97 MB \|
	+---------+-------------------+------------+
	\| 3 \| config.json \| 0.00 MB \|
	+---------+-------------------+------------+ - [multilabel_classify.py:2491:_save]
	2025-06-16 23:48:28,605 - INFO - Removing 'token_type_ids' from eval_dataset as they are not needed. - [multilabel_classify.py:2376:evaluate]
	2025-06-17 00:07:03,821 - INFO -
	[33m🔍 Evaluation Metrics 🔍
	+-------------------------------+----------+
	\| Metric \| Value \|
	+===============================+==========+
	\| eval_f1_micro \| 0.006203 \|
	+-------------------------------+----------+
	\| eval_f1_macro \| 0.005948 \|
	+-------------------------------+----------+
	\| eval_precision_at_5 \| 0.013054 \|
	+-------------------------------+----------+
	\| eval_recall_at_5 \| 0.004007 \|
	+-------------------------------+----------+
	\| eval_precision_at_8 \| 0.010779 \|
	+-------------------------------+----------+
	\| eval_recall_at_8 \| 0.005572 \|
	+-------------------------------+----------+
	\| eval_precision_at_15 \| 0.012447 \|
	+-------------------------------+----------+
	\| eval_recall_at_15 \| 0.010075 \|
	+-------------------------------+----------+
	\| eval_rare_f1_micro \| 0.003987 \|
	+-------------------------------+----------+
	\| eval_rare_f1_macro \| 0.003951 \|
	+-------------------------------+----------+
	\| eval_rare_precision \| 0.001998 \|
	+-------------------------------+----------+
	\| eval_rare_recall \| 0.999163 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_5 \| 0.005538 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_5 \| 0.002468 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_8 \| 0.004104 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_8 \| 0.002894 \|
	+-------------------------------+----------+
	\| eval_rare_precision_at_15 \| 0.003191 \|
	+-------------------------------+----------+
	\| eval_rare_recall_at_15 \| 0.004396 \|
	+-------------------------------+----------+
	\| eval_not_rare_f1_micro \| 0.135441 \|
	+-------------------------------+----------+
	\| eval_not_rare_f1_macro \| 0.130814 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision \| 0.072641 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall \| 0.999782 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_5 \| 0.139082 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_5 \| 0.084165 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_8 \| 0.106557 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_8 \| 0.100509 \|
	+-------------------------------+----------+
	\| eval_not_rare_precision_at_15 \| 0.098866 \|
	+-------------------------------+----------+
	\| eval_not_rare_recall_at_15 \| 0.165002 \|
	+-------------------------------+----------+
	\| eval_loss \| -2.3104 \|
	+-------------------------------+----------+[0m - [multilabel_classify.py:2231:on_evaluate]
	2025-06-17 00:07:07,535 - INFO - 💾 Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310 - [multilabel_classify.py:2469:_save]
	2025-06-17 00:07:07,537 - INFO - ⚙️ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310 - [multilabel_classify.py:2474:_save]
	2025-06-17 00:07:07,538 - INFO - 📋 Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-1310:
	+---------+--------------------+------------+
	\| Index \| Saved File \| Size \|
	+=========+====================+============+
	\| 1 \| training_args.bin \| 0.01 MB \|
	+---------+--------------------+------------+
	\| 2 \| optimizer.pt \| 1308.77 MB \|
	+---------+--------------------+------------+
	\| 3 \| model.safetensors \| 4600.97 MB \|
	+---------+--------------------+------------+
	\| 4 \| scaler.pt \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 5 \| config.json \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 6 \| scheduler.pt \| 0.00 MB \|
	+---------+--------------------+------------+
	\| 7 \| trainer_state.json \| 0.03 MB \|
	+---------+--------------------+------------+
	\| 8 \| rng_state.pth \| 0.01 MB \|
	+---------+--------------------+------------+ - [multilabel_classify.py:2491:_save]
	2025-06-17 00:07:08,790 - INFO - 📂 Loading best model from ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-262 - [multilabel_classify.py:2543:_load_best_model]
	2025-06-17 00:07:08,791 - INFO - 🖥️ Model is on device: cuda:0 - [multilabel_classify.py:2553:_load_best_model]
	2025-06-17 00:07:08,853 - INFO - 🔑 Key order comparison:
	+---------+--------------------------------------------+--------------------------------------------------------------------------------------+
	\| Index \| Saved state_dict Keys \| Model state_dict Keys \|
	+=========+============================================+======================================================================================+
	\| 1 \| attention.in_proj_bias \| boost_mul \|
	+---------+--------------------------------------------+--------------------------------------------------------------------------------------+
	\| 2 \| attention.in_proj_weight \| boost_add \|
	+---------+--------------------------------------------+--------------------------------------------------------------------------------------+
	\| 3 \| attention.out_proj.bias \| base_model.base_model.model.model.embed_tokens.weight \|
	+---------+--------------------------------------------+--------------------------------------------------------------------------------------+
	\| 4 \| attention.out_proj.weight \| base_model.base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight \|
	+---------+--------------------------------------------+--------------------------------------------------------------------------------------+
	\| 5 \| base_model.base_model.model.lm_head.weight \| base_model.base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight.absmax \|
	+---------+--------------------------------------------+--------------------------------------------------------------------------------------+ - [multilabel_classify.py:2577:_load_best_model]
	2025-06-17 00:07:09,846 - INFO - ✅ Loaded best model weights from ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/checkpoint-262/model.safetensors - [multilabel_classify.py:2594:_load_best_model]
	2025-06-17 00:07:09,885 - INFO - ✔️ Weight for boost_mul matches between saved and loaded state_dict - [multilabel_classify.py:2606:_load_best_model]
	2025-06-17 00:07:09,918 - INFO - ✔️ Weight for boost_add matches between saved and loaded state_dict - [multilabel_classify.py:2606:_load_best_model]
	2025-06-17 00:07:09,935 - INFO -
	[36m🚂 Training Metrics (Step 1310) 🚂
	+--------------------------+----------+
	\| Metric \| Value \|
	+==========================+==========+
	\| train_runtime \| 8383.29 \|
	+--------------------------+----------+
	\| train_samples_per_second \| 5.006 \|
	+--------------------------+----------+
	\| train_steps_per_second \| 0.156 \|
	+--------------------------+----------+
	\| total_flos \| 0 \|
	+--------------------------+----------+
	\| train_loss \| -3.07712 \|
	+--------------------------+----------+
	\| epoch \| 4.9981 \|
	+--------------------------+----------+[0m - [multilabel_classify.py:2212:on_log]
	2025-06-17 00:07:09,935 - INFO - ✨ Training Completed! ✨ - [multilabel_classify.py:2085:on_train_end]
	2025-06-17 00:07:10,008 - INFO - 📊 Training loss plot saved as '../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/train_loss_plot.png' - [multilabel_classify.py:2281:on_train_end]
	2025-06-17 00:07:10,069 - INFO - 📊 Evaluation loss plot saved as '../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/eval_loss_plot.png' - [multilabel_classify.py:2295:on_train_end]
	2025-06-17 00:07:10,128 - INFO - 📊 Evaluation metric plot saved as '../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b/eval_precision_at_15_plot.png' - [multilabel_classify.py:2316:on_train_end]
	2025-06-17 00:07:10,128 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:103:log_section]
	2025-06-17 00:07:10,128 - INFO - + ✨ MODEL SAVING + - [multilabel_classify.py:104:log_section]
	2025-06-17 00:07:10,128 - INFO - ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - [multilabel_classify.py:107:log_section]
	2025-06-17 00:07:10,128 - INFO - 💾 Saving trained model and pushing to Hugging Face Hub... - [multilabel_classify.py:4093:main]
	2025-06-17 00:07:10,128 - INFO - 📁 Creating/using output directory: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:3069:save_and_push]
	2025-06-17 00:07:14,321 - INFO - 💾 Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:2469:_save]
	2025-06-17 00:07:14,323 - INFO - ⚙️ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:2474:_save]
	2025-06-17 00:07:14,324 - INFO - 📋 Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b:
	+---------+--------------------------------------------+------------+
	\| Index \| Saved File \| Size \|
	+=========+============================================+============+
	\| 1 \| eval_loss_plot.png \| 0.03 MB \|
	+---------+--------------------------------------------+------------+
	\| 2 \| training_args.bin \| 0.01 MB \|
	+---------+--------------------------------------------+------------+
	\| 3 \| tokenizer.model \| 0.56 MB \|
	+---------+--------------------------------------------+------------+
	\| 4 \| tokenizer.json \| 3.50 MB \|
	+---------+--------------------------------------------+------------+
	\| 5 \| model.safetensors \| 4600.97 MB \|
	+---------+--------------------------------------------+------------+
	\| 6 \| config.json \| 0.00 MB \|
	+---------+--------------------------------------------+------------+
	\| 7 \| special_tokens_map.json \| 0.00 MB \|
	+---------+--------------------------------------------+------------+
	\| 8 \| tokenizer_config.json \| 0.13 MB \|
	+---------+--------------------------------------------+------------+
	\| 9 \| train_loss_plot.png \| 0.04 MB \|
	+---------+--------------------------------------------+------------+
	\| 10 \| eval_precision_at_15_plot.png \| 0.04 MB \|
	+---------+--------------------------------------------+------------+
	\| 11 \| README.md \| 0.01 MB \|
	+---------+--------------------------------------------+------------+
	\| 12 \| classification_log_2025-06-16_18-00-57.log \| 0.09 MB \|
	+---------+--------------------------------------------+------------+ - [multilabel_classify.py:2491:_save]
	2025-06-17 00:07:18,278 - INFO - 💾 Model weights saved in safetensors format: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:2469:_save]
	2025-06-17 00:07:18,280 - INFO - ⚙️ Config saved in checkpoint: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:2474:_save]
	2025-06-17 00:07:18,281 - INFO - 📋 Saved files in ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b:
	+---------+--------------------------------------------+------------+
	\| Index \| Saved File \| Size \|
	+=========+============================================+============+
	\| 1 \| eval_loss_plot.png \| 0.03 MB \|
	+---------+--------------------------------------------+------------+
	\| 2 \| training_args.bin \| 0.01 MB \|
	+---------+--------------------------------------------+------------+
	\| 3 \| tokenizer.model \| 0.56 MB \|
	+---------+--------------------------------------------+------------+
	\| 4 \| tokenizer.json \| 3.50 MB \|
	+---------+--------------------------------------------+------------+
	\| 5 \| model.safetensors \| 4600.97 MB \|
	+---------+--------------------------------------------+------------+
	\| 6 \| config.json \| 0.00 MB \|
	+---------+--------------------------------------------+------------+
	\| 7 \| special_tokens_map.json \| 0.00 MB \|
	+---------+--------------------------------------------+------------+
	\| 8 \| tokenizer_config.json \| 0.13 MB \|
	+---------+--------------------------------------------+------------+
	\| 9 \| train_loss_plot.png \| 0.04 MB \|
	+---------+--------------------------------------------+------------+
	\| 10 \| eval_precision_at_15_plot.png \| 0.04 MB \|
	+---------+--------------------------------------------+------------+
	\| 11 \| README.md \| 0.01 MB \|
	+---------+--------------------------------------------+------------+
	\| 12 \| classification_log_2025-06-16_18-00-57.log \| 0.09 MB \|
	+---------+--------------------------------------------+------------+ - [multilabel_classify.py:2491:_save]
	2025-06-17 00:08:53,532 - INFO - 💾 Model saved to: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:3073:save_and_push]
	2025-06-17 00:08:53,564 - INFO - 🖌️ Tokenizer saved to: ../tmp/MIMIC4_DEMO/mimic4_classify_mistral7b - [multilabel_classify.py:3077:save_and_push]