OPEA
/

DeepSeek-R1-int2-gptq-sym-inc

@@ -22,37 +22,12 @@ Please follow the license of the original model.
 please note int2 **may be slower** than int4 on CUDA due to kernel issue.
-**To prevent potential overflow and achieve better accuracy, we recommend using the CPU version detailed in the next section.**
 ~~~python
 ~~~python
 import transformers
 from transformers import AutoModelForCausalLM, AutoTokenizer
-#  https://github.com/huggingface/transformers/pull/35493
-def set_initialized_submodules(model, state_dict_keys):
-    """
-    Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state
-    dict.
-    """
-    state_dict_keys = set(state_dict_keys)
-    not_initialized_submodules = {}
-    for module_name, module in model.named_modules():
-        if module_name == "":
-            # When checking if the root module is loaded there's no need to prepend module_name.
-            module_keys = set(module.state_dict())
-        else:
-            module_keys = {f"{module_name}.{k}" for k in module.state_dict()}
-        if module_keys.issubset(state_dict_keys):
-            module._is_hf_initialized = True
-        else:
-            not_initialized_submodules[module_name] = module
-    return not_initialized_submodules
-transformers.modeling_utils.set_initialized_submodules = set_initialized_submodules
 import torch
 quantized_model_dir = "OPEA/DeepSeek-R1-int2-gptq-sym-inc"
@@ -72,24 +47,11 @@ for i in range(61):
 model = AutoModelForCausalLM.from_pretrained(
     quantized_model_dir,
-    torch_dtype=torch.float16,
-    trust_remote_code=True,
     device_map=device_map,
 )
-def forward_hook(module, input, output):
-    return torch.clamp(output, -65504, 65504)
-def register_fp16_hooks(model):
-    for name, module in model.named_modules():
-        if "QuantLinear" in module.__class__.__name__ or isinstance(module, torch.nn.Linear):
-            module.register_forward_hook(forward_hook)
-register_fp16_hooks(model)  ##better add this hook to avoid overflow
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
 prompts = [
     "9.11和9.8哪个数字大",
@@ -130,60 +92,6 @@ for i, prompt in enumerate(prompts):
     print(f"Generated: {decoded_outputs[i]}")
     print("-" * 50)
-"""
-Prompt: 9.11和9.8哪个数字大
-Generated: <think>
-首先，比较9.11和9.8的整数部分，两者都是9，所以需要比较小数部分。9.11的小数部分是0.11，而9.8的小数部分是0.8。由于0.8大于0.11， 因此9.8比9.11大。
-</think>
-9.11和9.8中，9.8的整数部分和9.11相同，但9.8的小数部分0.8大于9.11的0.11，因此9.8更大。
---------------------------------------------------
---------------------------------------------------
-Prompt: 如果你是人，你最想做什么“
-Generated: <think>
-嗯，用户问如果我是人，我最想做什么。首先，我需要理解这个问题的背景。用户可能好奇作为一个AI，我是否有愿望或欲望，或者他们想知道我作为AI能提供什么样的帮助。也许他们想知道我的功能或能力，或者他们想了解我的“愿望”是否类似于人类的愿望。
-首先，我需要明确，作为AI，我没有意识、情感或欲望。但我可以模拟人类对话，所以我可以提供帮助、回答问题、提供建议等。用户可能想知道我是否能模拟人类的行为或兴趣，比如兴趣爱好或愿望。
-接下来，我应该考虑用户的需求。他们可能希望了解AI的能力，或者他们可能好奇AI是否有类似人类的愿望。因此，我需要解释虽然我没有个人愿望，但我可以协助他们完成各种任务，比如提供 information, solving problems, or offering advice.
-另外，用户可能 be testing the AI's ability to engage in creative or imaginative thinking. They might want to see how I handle hypothetical or imaginative scenarios. So, I should respond in a way that's both informative and engaging, showing that I can think creatively even if I don't have personal desires.
-I should also consider the structure of the answer. Start by acknowledging the question, explain that I don't have personal desires, but can assist with various tasks. Then provide examples of what I can do, like answering questions, offering advice, or helping with creative tasks. Finally, invite the user to ask more questions if they need help.
-I need to keep the tone friendly and approachable, making sure the response is clear and helpful. Maybe add some examples of tasks I can help with, like solving math problems, offering advice, or generating creative content.
-Also, since the user asked in Chinese, I should respond in Chinese, but since the original question was in Chinese, maybe they want the answer in Chinese. But since the previous conversation was in English, maybe they want the answer in English. I should check the language of the question. The user's question is in Chinese, so I should respond in Chinese, but maybe include some English if needed.
-Wait, the user's question is in Chinese: "如果你是人，你最想做什么". So I should respond in Chinese. But the previous conversation was in English. Maybe the user is bilingual. So I need to decide whether to respond in Chinese or English. Since the user's question is in Chinese, I'll respond in Chinese
---------------------------------------------------
-Prompt: How many e in word deepseek
-Generated: <think>
-Okay, so I need to figure out how many times the letter 'e' appears in the word "deepseek". Let me start by looking at the word itself. The word is "deepseek". Let me write it out to visualize it better: D, E, E, P, S, E, E, K. Wait, let me check that again. Hmm, maybe I should count each letter individually.
-First, I'll write down the word and count each letter. The word is "deepseek". Let me spell it out: D, E, E, P, S, E, E, K. So, the letters are D, E, E, P, S, E, E, K. Now, I need to count how many times the letter 'e' appears. Let me go through each letter one by one.
-Starting from the first letter: D. That's not an 'e'. Then the next one is E. That's an 'e', so that's 1. The next letter is another E, so that's 2. Then comes P, which isn't an E. Then S, not an E. Then another E, making it 3. Then another E, so that's 4. Finally, K. So, in total, there are 4 'e's in the word "deepseek".
-Wait, let me double-check. The word is "deepseek". Let me write it again: D, E, E, P, S, E, E, K. So, positions 2, 3, 6, and 7 are E's. That's four E's. So, the answer should be 4. I think that's correct. I don't think I missed any. Let me check again. D, E, E, P, S, E, E, K. So, positions 2, 3, 6, 7. That's four E's. Yeah, that seems right. So, the number of E's in "deepseek" is 4.
-</think>
-The word "deepseek" contains **4** instances of the letter 'e'.
---------------------------------------------------
-Prompt: There are ten birds in a tree. A hunter shoots one. How many are left in the tree?
-Generated: <think>
-Okay, so there's this problem here: there are ten birds in a tree, a hunter shoots one. How many are left in the tree? Hmm, at first glance, it seems straightforward, but I need to think carefully. Let me break it down.
-First, there are ten birds in the tree. Then a hunter shoots one. The question is, how many are left? Well, if there were ten birds and one is shot, you might think it's a simple subtraction: 10 minus 1 equals 9. But wait, maybe there's a trick here. Sometimes these kinds of questions are designed to test attention to detail or to trick the reader.
-So, let's consider the scenario. If the hunter shoots one bird, what happens to the rest? If the hunter shoots one, that bird is either dead or flies away. If the bird is dead, it's no longer in the tree, so that would leave 9. But maybe the other birds get scared and fly away. If all the other birds fly away, then there would be none left. But the question is about how many are left in the tree. So, if the other birds stay, it's 9. If they all leave, it's 0. But the question is about the ones left in the tree, so maybe it's 9. But maybe the answer is different.
-Wait, maybe the problem is a play on words. The question is, "how many are left in the tree?" So, if the hunter shoots one, and the rest are still in the tree, then it's 9. But maybe the answer is different. Let me think. If the hunter shoots one, and the other birds get scared and fly away, then there are none left. But the question is about the ones left in the tree. So, if they all fly away, then zero. But maybe the answer is 9 because only one was shot, and the rest remain. But maybe the answer is 0 because all the birds flew away. Hmm.
-Wait, maybe the answer is 9 because only one was shot, so 10 minus 1 is 9. But maybe the answer is different because when you shoot a gun, the sound might scare the other birds, so they all
-"""
 ~~~

 please note int2 **may be slower** than int4 on CUDA due to kernel issue.
 ~~~python
 ~~~python
 import transformers
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
 quantized_model_dir = "OPEA/DeepSeek-R1-int2-gptq-sym-inc"
 model = AutoModelForCausalLM.from_pretrained(
     quantized_model_dir,
+    torch_dtype=torch.bfloat16,
     device_map=device_map,
 )
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
 prompts = [
     "9.11和9.8哪个数字大",
     print(f"Generated: {decoded_outputs[i]}")
     print("-" * 50)
 ~~~