danielhanchen commited on
Commit
79ebe4e
·
verified ·
1 Parent(s): c63801f

Add files using upload-large-folder tool

Browse files
README.md CHANGED
@@ -1,59 +1,20 @@
1
  ---
2
- base_model: Qwen/Qwen2.5-VL-7B-Instruct
 
 
3
  language:
4
  - en
5
- library_name: transformers
6
  pipeline_tag: image-text-to-text
7
- license: apache-2.0
8
  tags:
9
  - multimodal
10
- - qwen
11
- - qwen2
12
  - unsloth
13
- - transformers
14
- - vision
15
  ---
16
- <div>
17
- <p style="margin-bottom: 0;margin-top:0;">
18
- <em>View all of our uploaded models <a href="https://docs.unsloth.ai/get-started/all-our-models">here</em>
19
- </p>
20
- <div style="display: flex; gap: 5px; align-items: center;margin-top:0; ">
21
- <a href="https://github.com/unslothai/unsloth/">
22
- <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
23
- </a>
24
- <a href="https://discord.gg/unsloth">
25
- <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
26
- </a>
27
- <a href="https://docs.unsloth.ai/">
28
- <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
29
- </a>
30
- </div>
31
- <h1 style="margin-top: 0rem;">Finetune LLMs 2-5x faster with 70% less memory via Unsloth</h2>
32
- </div>
33
- We have a free Google Colab Tesla T4 notebook for Qwen2-VL (7B) here: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb
34
-
35
- ## ✨ Finetune for Free
36
-
37
- All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
38
-
39
- | Unsloth supports | Free Notebooks | Performance | Memory use |
40
- |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
41
- | **Llama-3.2 (3B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2.4x faster | 58% less |
42
- | **Llama-3.2 (11B vision)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 60% less |
43
- | **Qwen2 VL (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb) | 1.8x faster | 60% less |
44
- | **Qwen2.5 (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 60% less |
45
- | **Llama-3.1 (8B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb) | 2.4x faster | 58% less |
46
- | **Phi-3.5 (mini)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_3.5_Mini-Conversational.ipynb) | 2x faster | 50% less |
47
- | **Gemma 2 (9B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma2_(9B)-Alpaca.ipynb) | 2.4x faster | 58% less |
48
- | **Mistral (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb) | 2.2x faster | 62% less |
49
-
50
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="200"/>](https://docs.unsloth.ai)
51
-
52
- - This [Llama 3.2 conversational notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates.
53
- - This [text completion notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_(7B)-Text_Completion.ipynb) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
54
- - \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
55
-
56
- # Qwen2.5-VL
57
 
58
  ## Introduction
59
 
@@ -567,4 +528,3 @@ If you find our work helpful, feel free to give us a cite.
567
  year={2023}
568
  }
569
  ```
570
-
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-VL-7B-Instruct
4
+ license: apache-2.0
5
  language:
6
  - en
 
7
  pipeline_tag: image-text-to-text
 
8
  tags:
9
  - multimodal
 
 
10
  - unsloth
11
+ library_name: transformers
 
12
  ---
13
+
14
+ # Qwen2.5-VL-7B-Instruct
15
+ <a href="https://chat.qwenlm.ai/" target="_blank" style="margin: 2px;">
16
+ <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
17
+ </a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Introduction
20
 
 
528
  year={2023}
529
  }
530
  ```
 
chat_template.jinja ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
2
+ You are a helpful assistant.<|im_end|>
3
+ {% endif %}<|im_start|>{{ message['role'] }}
4
+ {% if message['content'] is string %}{{ message['content'] }}<|im_end|>
5
+ {% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
6
+ {% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
7
+ {% endif %}
config.json CHANGED
@@ -1,5 +1,4 @@
1
  {
2
- "_name_or_path": "Qwen/Qwen2.5-VL-7B-Instruct",
3
  "architectures": [
4
  "Qwen2_5_VLForConditionalGeneration"
5
  ],
@@ -10,7 +9,7 @@
10
  "image_token_id": 151655,
11
  "initializer_range": 0.02,
12
  "intermediate_size": 18944,
13
- "max_position_embeddings": 32768,
14
  "max_window_layers": 28,
15
  "model_type": "qwen2_5_vl",
16
  "num_attention_heads": 28,
@@ -29,20 +28,76 @@
29
  },
30
  "rope_theta": 1000000.0,
31
  "sliding_window": 32768,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  "tie_word_embeddings": false,
33
  "torch_dtype": "bfloat16",
34
- "transformers_version": "4.49.0",
35
  "unsloth_fixed": true,
36
  "use_cache": true,
37
  "use_sliding_window": false,
38
  "video_token_id": 151656,
39
  "vision_config": {
 
 
 
 
 
 
 
 
40
  "hidden_size": 1280,
 
41
  "in_chans": 3,
 
 
42
  "model_type": "qwen2_5_vl",
 
 
 
 
43
  "spatial_patch_size": 14,
 
44
  "tokens_per_second": 2,
45
- "torch_dtype": "bfloat16"
 
46
  },
47
  "vision_end_token_id": 151653,
48
  "vision_start_token_id": 151652,
 
1
  {
 
2
  "architectures": [
3
  "Qwen2_5_VLForConditionalGeneration"
4
  ],
 
9
  "image_token_id": 151655,
10
  "initializer_range": 0.02,
11
  "intermediate_size": 18944,
12
+ "max_position_embeddings": 128000,
13
  "max_window_layers": 28,
14
  "model_type": "qwen2_5_vl",
15
  "num_attention_heads": 28,
 
28
  },
29
  "rope_theta": 1000000.0,
30
  "sliding_window": 32768,
31
+ "text_config": {
32
+ "architectures": [
33
+ "Qwen2_5_VLForConditionalGeneration"
34
+ ],
35
+ "attention_dropout": 0.0,
36
+ "bos_token_id": 151643,
37
+ "eos_token_id": 151645,
38
+ "hidden_act": "silu",
39
+ "hidden_size": 3584,
40
+ "image_token_id": null,
41
+ "initializer_range": 0.02,
42
+ "intermediate_size": 18944,
43
+ "max_position_embeddings": 128000,
44
+ "max_window_layers": 28,
45
+ "model_type": "qwen2_5_vl_text",
46
+ "num_attention_heads": 28,
47
+ "num_hidden_layers": 28,
48
+ "num_key_value_heads": 4,
49
+ "rms_norm_eps": 1e-06,
50
+ "rope_scaling": {
51
+ "mrope_section": [
52
+ 16,
53
+ 24,
54
+ 24
55
+ ],
56
+ "rope_type": "default",
57
+ "type": "default"
58
+ },
59
+ "rope_theta": 1000000.0,
60
+ "sliding_window": 32768,
61
+ "torch_dtype": "bfloat16",
62
+ "use_cache": true,
63
+ "use_sliding_window": false,
64
+ "video_token_id": null,
65
+ "vision_end_token_id": 151653,
66
+ "vision_start_token_id": 151652,
67
+ "vision_token_id": 151654,
68
+ "vocab_size": 152064
69
+ },
70
  "tie_word_embeddings": false,
71
  "torch_dtype": "bfloat16",
72
+ "transformers_version": "4.52.0.dev0",
73
  "unsloth_fixed": true,
74
  "use_cache": true,
75
  "use_sliding_window": false,
76
  "video_token_id": 151656,
77
  "vision_config": {
78
+ "depth": 32,
79
+ "fullatt_block_indexes": [
80
+ 7,
81
+ 15,
82
+ 23,
83
+ 31
84
+ ],
85
+ "hidden_act": "silu",
86
  "hidden_size": 1280,
87
+ "in_channels": 3,
88
  "in_chans": 3,
89
+ "initializer_range": 0.02,
90
+ "intermediate_size": 3420,
91
  "model_type": "qwen2_5_vl",
92
+ "num_heads": 16,
93
+ "out_hidden_size": 3584,
94
+ "patch_size": 14,
95
+ "spatial_merge_size": 2,
96
  "spatial_patch_size": 14,
97
+ "temporal_patch_size": 2,
98
  "tokens_per_second": 2,
99
+ "torch_dtype": "bfloat16",
100
+ "window_size": 112
101
  },
102
  "vision_end_token_id": 151653,
103
  "vision_start_token_id": 151652,
generation_config.json CHANGED
@@ -5,11 +5,9 @@
5
  151645,
6
  151643
7
  ],
8
- "max_length": 32768,
9
  "pad_token_id": 151654,
10
  "repetition_penalty": 1.05,
11
- "temperature": 0.1,
12
- "top_k": 1,
13
- "top_p": 0.001,
14
- "transformers_version": "4.49.0"
15
  }
 
5
  151645,
6
  151643
7
  ],
8
+ "max_length": 128000,
9
  "pad_token_id": 151654,
10
  "repetition_penalty": 1.05,
11
+ "temperature": 1e-06,
12
+ "transformers_version": "4.52.0.dev0"
 
 
13
  }
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d725335e4ea2399be706469e4b8807716a8fa64bd03468252e9f7acf2415fee4
3
+ size 4968243304
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1830db6908dcc76df3a71492acbcf2b8cac130114cf1f3c2d9edae8de8c6de3
3
+ size 4991495816
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09c1807c6d00d7cab94f7db39d4c02ebb8537225ccde383861ac48db97945aa6
3
+ size 4932751040
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5dd068336d14d45ffb43cef374d286cc6ba9d8741b028f90a7d040d847961f4a
3
+ size 1691924384
model.safetensors.index.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -195,16 +195,16 @@
195
  "<|video_pad|>"
196
  ],
197
  "bos_token": null,
198
- "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
  "clean_up_tokenization_spaces": false,
200
  "eos_token": "<|im_end|>",
201
  "errors": "replace",
202
  "extra_special_tokens": {},
203
- "model_max_length": 32768,
204
  "pad_token": "<|vision_pad|>",
205
  "padding_side": "left",
206
  "processor_class": "Qwen2_5_VLProcessor",
207
  "split_special_tokens": false,
208
  "tokenizer_class": "Qwen2Tokenizer",
209
- "unk_token": null
210
- }
 
 
195
  "<|video_pad|>"
196
  ],
197
  "bos_token": null,
 
198
  "clean_up_tokenization_spaces": false,
199
  "eos_token": "<|im_end|>",
200
  "errors": "replace",
201
  "extra_special_tokens": {},
202
+ "model_max_length": 128000,
203
  "pad_token": "<|vision_pad|>",
204
  "padding_side": "left",
205
  "processor_class": "Qwen2_5_VLProcessor",
206
  "split_special_tokens": false,
207
  "tokenizer_class": "Qwen2Tokenizer",
208
+ "unk_token": null,
209
+ "chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
210
+ }