npc0
/

llama3.1-41B-raw

+---
+base_model: meta-llama/Meta-Llama-3.1-70B-Instruct
+language:
+  - en
+  - de
+  - fr
+  - it
+  - pt
+  - hi
+  - es
+  - th
+library_name: transformers
+license: llama3.1
+pipeline_tag: text-generation
+tags:
+  - facebook
+  - meta
+  - pytorch
+  - llama
+  - llama-3
+---
+## Model Information
+The Llama 3.1 text only 41B model is pruned from Llama 3.1 instruction finetuned text only 70B
+using [FLAP method](arxiv.org/abs/2312.11983).
+Hyper parameters used for pruning:
+```
+metrics: WIFV
+structure: AL-AM
+pruning_ratio: 0.5
+```
+## Limitation
+This `llama3.1-41B-raw` model shows unstable output.
+A finetune on instruction dataset is recommended.
+The model is not supported by any library at the moment
+due to its unconsistent shape between layers after pruning.
+## Usage
+The model is not supported by any library at the moment,
+following is a workaround.
+```python
+from functools import reduce
+def get_module_by_name(module, access_string):
+    names = access_string.split(sep='.')
+    return reduce(getattr, names, module)
+import json
+from safetensors import safe_open
+from transformers import LlamaForCausalLM
+class MyLlamaForCausalLM(LlamaForCausalLM):
+    def __init__(self, config):
+        super().__init__(config)
+        with open(os.path.join(
+                config._name_or_path,
+                "model.safetensors.index.json")) as f:
+            weight_map = json.load(f)
+            weight_map = weight_map["weight_map"]
+        for name, path in weight_map.items():
+            module_name = name.replace('.weight', '')
+            if '.bias' in module_name:
+                continue
+            layer = get_module_by_name(self, module_name)
+            with safe_open(
+                os.path.join(
+                    config._name_or_path,
+                    path), framework="pt") as f:
+                tensor = f.get_tensor(name)
+            if 'mlp.' in name or 'attn.' in name:
+                if tensor.shape != (layer.out_features, layer.in_features):
+                    layer = layer.__init__(
+                        tensor.shape[1],
+                        tensor.shape[0],
+                        bias=layer.bias,
+                        dtype=layer.weight.dtype,
+                        device=layer.weight.device)
+        for name, path in weight_map.items():
+            if 'attn.' in name:
+                module = get_module_by_name(
+                    self,
+                    '.'.join(name.split('.')[:-2]))
+                module.num_heads = module.q_proj.out_features // module.head_dim
+                module.num_key_value_heads = module.num_heads
+                module.num_key_value_groups = module.num_heads // module.num_key_value_heads
+model = MyLlamaForCausalLM.from_pretrained(
+    "npc0/llama3.1-41B-raw",
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    "FLAP/llm_weights/flap_p0.5_WIFV_ALAM_llama_70b")
+model = model.eval()
+messages = [
+    {"role": "system", "content": "You are a helpful AI assistant."},
+    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
+    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
+    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
+]
+model_inputs = tokenizer.apply_chat_template(messages,
+                                             return_tensors="pt").to(model.device)
+generated_ids = model.generate(model_inputs, max_new_tokens=128)
+decoded = tokenizer.batch_decode(generated_ids)
+print(decoded[0])
+```