OPEA
/

DeepSeek-V3-int4-sym-gptq-inc

@@ -18,35 +18,10 @@ Please follow the license of the original model.
 **INT4 Inference on CUDA**(**at least 7*80G**)
-On CUDA devices, the computation dtype is typically FP16 for int4 , which may lead to overflow for this model.
-While we have added a workaround to address this issue, we cannot guarantee reliable performance for all prompts.
-**For better stability, using CPU version is recommended. Please refer to the following section for details.**
 ~~~python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import transformers
-#  https://github.com/huggingface/transformers/pull/35493
-def set_initialized_submodules(model, state_dict_keys):
-    """
-    Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state
-    dict.
-    """
-    state_dict_keys = set(state_dict_keys)
-    not_initialized_submodules = {}
-    for module_name, module in model.named_modules():
-        if module_name == "":
-            # When checking if the root module is loaded there's no need to prepend module_name.
-            module_keys = set(module.state_dict())
-        else:
-            module_keys = {f"{module_name}.{k}" for k in module.state_dict()}
-        if module_keys.issubset(state_dict_keys):
-            module._is_hf_initialized = True
-        else:
-            not_initialized_submodules[module_name] = module
-    return not_initialized_submodules
-transformers.modeling_utils.set_initialized_submodules = set_initialized_submodules
 import torch
@@ -73,8 +48,7 @@ for i in range(61):
 model = AutoModelForCausalLM.from_pretrained(
     quantized_model_dir,
-    torch_dtype=torch.float16,
-    trust_remote_code=True,
     device_map=device_map,
 )
@@ -121,69 +95,7 @@ for i, prompt in enumerate(prompts):
     print(f"Generated: {decoded_outputs[i]}")
     print("-" * 50)
-"""
-Prompt: 9.11和9.8哪个数字大
-Generated: 要比较 **9.11** 和 **9.8** 的大小，可以按照以下步骤进行：
-1. **比较整数部分**：
-   - 两个数的整数部分都是 **9**，因此整数部分相同。
-2. **比较小数部分**：
-   - **9.11** 的小数部分是 **0.11**
-   - **9.8** 的小数部分是 **0.8**
-3. **统一小数位数**：
-   - 将 **0.8** 转换为 **0.80**，以便于比较。
-4. **进行大小比较**：
-   - **0.80** > **0.11**
-因此，**9.8** 大于 **9.11**。
-最终答案：\boxed{9.8}
---------------------------------------------------
---------------------------------------------------
-Prompt: strawberry中有几个r?
-Generated: ### 第一步：理解问题
-首先，我需要明确问题的含义。问题是：“strawberry中有几个r？”。这里的“strawberry”是一个英文单词，意思是“草莓”。问题问的是这个单词中有多少个字母“r”。
-### 第二步：分解单词
-为了找出“strawberry”中有多少个“r”，我需要将这个单词分解成单个字母。让我们逐个字母来看：
-- s
-# 2023年10月浙江宁波市鄞州区第二医院医共体首南分院编外人员招考聘用笔试历年高频考点（难、易错点荟萃）附带答案详解.docx
-## 2023年10月浙江宁波市鄞州区第二医院医共体首南分院编外人员招考聘用笔试历年高频考点（难、易错点荟萃）附带答案详解.docx
-- 4、
---------------------------------------------------
-Prompt: How many r in strawberry.
-Generated: The word "strawberry" contains **3 "r"s.
---------------------------------------------------
-Prompt: There is a girl who likes adventure,
-Generated: That's wonderful! A girl who loves adventure is likely curious, brave, and eager to explore new experiences. Here are some ideas to fuel her adventurous spirit:
-### Outdoor Adventures:
-1. **Hiking**: Explore local trails, national parks, or even plan a multi-day trek.
-2. **Camping**: Spend a night under the stars, roast marshmallows, and tell stories around a campfire.
-3. **Rock Climbing**: Challenge herself with indoor or outdoor climbing.
-4. **Kayaking or Canoeing**: Paddle through rivers, lakes, or even the ocean.
-5. **Zip-lining**: Soar through the treetops for an adrenaline rush.
-### Travel Adventures:
-1. **Road Trips**: Plan a trip to a new city or state, stopping at interesting landmarks along the way.
-2. **Backpacking**: Travel light and explore
---------------------------------------------------
-Prompt: Please give a brief introduction of DeepSeek company.
-Generated: DeepSeek Artificial Intelligence Co., Ltd. (referred to as "DeepSeek" or "深度求索") , founded in 2023, is a Chinese company dedicated to making AGI a reality.
---------------------------------------------------
-Prompt: hello
-Generated: Hello! How can I assist you today? 😊
-"""
 ~~~
 ### INT4 Inference on CPU with ITREX(Recommended)

 **INT4 Inference on CUDA**(**at least 7*80G**)
 ~~~python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import transformers
 import torch
 model = AutoModelForCausalLM.from_pretrained(
     quantized_model_dir,
+    torch_dtype=torch.bfloat16,
     device_map=device_map,
 )
     print(f"Generated: {decoded_outputs[i]}")
     print("-" * 50)
 ~~~
 ### INT4 Inference on CPU with ITREX(Recommended)