Upload 11 files
Browse filesAdd model files of 
MiniCPM4-0.5B-QAT-Int4-unquantized
- README.md +152 -3
- added_tokens.json +10 -0
- config.json +37 -0
- configuration_minicpm.py +207 -0
- generation_config.json +12 -0
- model.safetensors +3 -0
- modeling_minicpm.py +0 -0
- special_tokens_map.json +33 -0
- tokenizer.json +0 -0
- tokenizer.model +3 -0
- tokenizer_config.json +117 -0
    	
        README.md
    CHANGED
    
    | @@ -1,3 +1,152 @@ | |
| 1 | 
            -
            ---
         | 
| 2 | 
            -
            license: apache-2.0
         | 
| 3 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            license: apache-2.0
         | 
| 3 | 
            +
            language:
         | 
| 4 | 
            +
            - zh
         | 
| 5 | 
            +
            - en
         | 
| 6 | 
            +
            pipeline_tag: text-generation
         | 
| 7 | 
            +
            library_name: transformers
         | 
| 8 | 
            +
            ---
         | 
| 9 | 
            +
            <div align="center">
         | 
| 10 | 
            +
            <img src="https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm_logo.png?raw=true" width="500em" ></img> 
         | 
| 11 | 
            +
            </div>
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            <p align="center">
         | 
| 14 | 
            +
            <a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">GitHub Repo</a> |
         | 
| 15 | 
            +
            <a href="https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf" target="_blank">Technical Report</a> 
         | 
| 16 | 
            +
            </p>
         | 
| 17 | 
            +
            <p align="center">
         | 
| 18 | 
            +
            👋 Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
         | 
| 19 | 
            +
            </p>
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            ## What's New
         | 
| 22 | 
            +
            - [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).🔥🔥🔥
         | 
| 23 | 
            +
             | 
| 24 | 
            +
            ## MiniCPM4 Series
         | 
| 25 | 
            +
            MiniCPM4 series are highly efficient large language models (LLMs) designed explicitly for end-side devices, which achieves this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems.
         | 
| 26 | 
            +
            - [MiniCPM4-8B](https://huggingface.co/openbmb/MiniCPM4-8B): The flagship of MiniCPM4, with 8B parameters, trained on 8T tokens.
         | 
| 27 | 
            +
            - [MiniCPM4-0.5B](https://huggingface.co/openbmb/MiniCPM4-0.5B): The small version of MiniCPM4, with 0.5B parameters, trained on 1T tokens.
         | 
| 28 | 
            +
            - [MiniCPM4-8B-Eagle-FRSpec](https://huggingface.co/openbmb/MiniCPM4-8B-Eagle-FRSpec): Eagle head for FRSpec, accelerating speculative inference for MiniCPM4-8B.
         | 
| 29 | 
            +
            - [MiniCPM4-8B-Eagle-FRSpec-QAT-cpmcu](https://huggingface.co/openbmb/MiniCPM4-8B-Eagle-FRSpec-QAT-cpmcu): Eagle head trained with QAT for FRSpec, efficiently integrate speculation and quantization to achieve ultra acceleration for MiniCPM4-8B.
         | 
| 30 | 
            +
            - [MiniCPM4-8B-Eagle-vLLM](https://huggingface.co/openbmb/MiniCPM4-8B-Eagle-vLLM): Eagle head in vLLM format, accelerating speculative inference for MiniCPM4-8B.
         | 
| 31 | 
            +
            - [MiniCPM4-8B-marlin-Eagle-vLLM](https://huggingface.co/openbmb/MiniCPM4-8B-marlin-Eagle-vLLM): Quantized Eagle head for vLLM format, accelerating speculative inference for MiniCPM4-8B.
         | 
| 32 | 
            +
            - [BitCPM4-0.5B](https://huggingface.co/openbmb/BitCPM4-0.5B): Extreme ternary quantization applied to MiniCPM4-0.5B compresses model parameters into ternary values, achieving a 90% reduction in bit width.
         | 
| 33 | 
            +
            - [BitCPM4-1B](https://huggingface.co/openbmb/BitCPM4-1B): Extreme ternary quantization applied to MiniCPM3-1B compresses model parameters into ternary values, achieving a 90% reduction in bit width.
         | 
| 34 | 
            +
            - [MiniCPM4-Survey](https://huggingface.co/openbmb/MiniCPM4-Survey): Based on MiniCPM4-8B, accepts users' quiries as input and autonomously generate trustworthy, long-form survey papers.
         | 
| 35 | 
            +
            - [MiniCPM4-MCP](https://huggingface.co/openbmb/MiniCPM4-MCP): Based on MiniCPM4-8B, accepts users' queries and available MCP tools as input and autonomously calls relevant MCP tools to satisfy users' requirements.
         | 
| 36 | 
            +
            - [MiniCPM4-0.5B-QAT-Int4-unquantized](https://huggingface.co/openbmb/MiniCPM4-0.5B-QAT-Int4-unquantized): Int4 version of MiniCPM4-0.5B, trained by QAT and stored in fake quantization style. (**<-- you are here**)
         | 
| 37 | 
            +
            - [MiniCPM4-0.5B-QAT-Int4-GPTQ-format](https://huggingface.co/openbmb/MiniCPM4-0.5B-QAT-Int4-GPTQ-format): Int4 version of MiniCPM4-0.5B, trained by QAT and stored in GPTQ format.
         | 
| 38 | 
            +
            - [MiniCPM4-0.5B-QAT-Int4-GGUF](https://huggingface.co/openbmb/MiniCPM4-0.5B-QAT-Int4-GGUF): Int4 version of MiniCPM4-0.5B in GGUF.
         | 
| 39 | 
            +
             | 
| 40 | 
            +
            ## Introduction
         | 
| 41 | 
            +
            MiniCPM 4 is an extremely efficient edge-side large model that has undergone efficient optimization across four dimensions: model architecture, learning algorithms, training data, and inference systems, achieving ultimate efficiency improvements.
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            - 🏗️ **Efficient Model Architecture:**
         | 
| 44 | 
            +
              - InfLLM v2 -- Trainable Sparse Attention Mechanism: Adopts a trainable sparse attention mechanism architecture where each token only needs to compute relevance with less than 5% of tokens in 128K long text processing, significantly reducing computational overhead for long texts
         | 
| 45 | 
            +
             | 
| 46 | 
            +
            - 🧠 **Efficient Learning Algorithms:**
         | 
| 47 | 
            +
              - Model Wind Tunnel 2.0 -- Efficient Predictable Scaling: Introduces scaling prediction methods for performance of downstream tasks, enabling more precise model training configuration search
         | 
| 48 | 
            +
              - BitCPM -- Ultimate Ternary Quantization: Compresses model parameter bit-width to 3 values, achieving 90% extreme model bit-width reduction
         | 
| 49 | 
            +
              - Efficient Training Engineering Optimization: Adopts FP8 low-precision computing technology combined with Multi-token Prediction training strategy
         | 
| 50 | 
            +
             | 
| 51 | 
            +
            - 📚 **High-Quality Training Data:**
         | 
| 52 | 
            +
              - UltraClean -- High-quality Pre-training Data Filtering and Generation: Builds iterative data cleaning strategies based on efficient data verification, open-sourcing high-quality Chinese and English pre-training dataset [UltraFinweb](https://huggingface.co/datasets/openbmb/Ultra-FineWeb)
         | 
| 53 | 
            +
              - UltraChat v2 -- High-quality Supervised Fine-tuning Data Generation: Constructs large-scale high-quality supervised fine-tuning datasets covering multiple dimensions including knowledge-intensive data, reasoning-intensive data, instruction-following data, long text understanding data, and tool calling data
         | 
| 54 | 
            +
             | 
| 55 | 
            +
            - ⚡ **Efficient Inference System:**
         | 
| 56 | 
            +
              - CPM.cu -- Lightweight and Efficient CUDA Inference Framework: Integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding
         | 
| 57 | 
            +
              - ArkInfer -- Cross-platform Deployment System: Supports efficient deployment across multiple backend environments, providing flexible cross-platform adaptation capabilities
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            ## Usage
         | 
| 60 | 
            +
            ### Inference with Transformers
         | 
| 61 | 
            +
            ```python
         | 
| 62 | 
            +
            from transformers import AutoModelForCausalLM, AutoTokenizer
         | 
| 63 | 
            +
            import torch
         | 
| 64 | 
            +
             | 
| 65 | 
            +
            path = "openbmb/MiniCPM4-0.5B-QAT-Int4-unquantized"
         | 
| 66 | 
            +
            device = "cuda"
         | 
| 67 | 
            +
             | 
| 68 | 
            +
            tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
         | 
| 69 | 
            +
            model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
         | 
| 70 | 
            +
             | 
| 71 | 
            +
            messages = [
         | 
| 72 | 
            +
                {"role": "user", "content": "推荐5个北京的景点。"},
         | 
| 73 | 
            +
            ]
         | 
| 74 | 
            +
            model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)
         | 
| 75 | 
            +
             | 
| 76 | 
            +
            model_outputs = model.generate(
         | 
| 77 | 
            +
                model_inputs,
         | 
| 78 | 
            +
                max_new_tokens=1024,
         | 
| 79 | 
            +
                top_p=0.7,
         | 
| 80 | 
            +
                temperature=0.7
         | 
| 81 | 
            +
            )
         | 
| 82 | 
            +
             | 
| 83 | 
            +
            output_token_ids = [
         | 
| 84 | 
            +
                model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
         | 
| 85 | 
            +
            ]
         | 
| 86 | 
            +
             | 
| 87 | 
            +
            responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
         | 
| 88 | 
            +
            print(responses)
         | 
| 89 | 
            +
             | 
| 90 | 
            +
            ```
         | 
| 91 | 
            +
             | 
| 92 | 
            +
            ### Inference with [vLLM](https://github.com/vllm-project/vllm)
         | 
| 93 | 
            +
             | 
| 94 | 
            +
            You can inference MiniCPM4-0.5B-QAT-Int4-unquantized with vLLM:
         | 
| 95 | 
            +
            ```python
         | 
| 96 | 
            +
            from transformers import AutoTokenizer
         | 
| 97 | 
            +
            from vllm import LLM, SamplingParams
         | 
| 98 | 
            +
             | 
| 99 | 
            +
            model_name = "openbmb/MiniCPM4-0.5B-QAT-Int4-unquantized"
         | 
| 100 | 
            +
            prompt = [{"role": "user", "content": "推荐5个北京的景点。"}]
         | 
| 101 | 
            +
             | 
| 102 | 
            +
            tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
         | 
| 103 | 
            +
            input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
         | 
| 104 | 
            +
             | 
| 105 | 
            +
            llm = LLM(
         | 
| 106 | 
            +
                model=model_name,
         | 
| 107 | 
            +
                trust_remote_code=True,
         | 
| 108 | 
            +
                max_num_batched_tokens=32768,
         | 
| 109 | 
            +
                dtype="bfloat16", 
         | 
| 110 | 
            +
                gpu_memory_utilization=0.8, 
         | 
| 111 | 
            +
            )
         | 
| 112 | 
            +
            sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
         | 
| 113 | 
            +
             | 
| 114 | 
            +
            outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)
         | 
| 115 | 
            +
             | 
| 116 | 
            +
            print(outputs[0].outputs[0].text)
         | 
| 117 | 
            +
            ```
         | 
| 118 | 
            +
             | 
| 119 | 
            +
            ## Evaluation Results
         | 
| 120 | 
            +
            | Model          | Qwen3 | Llama3.2 | Gemma3 | MiniCPM4 | MiniCPM4 | MiniCPM4 |
         | 
| 121 | 
            +
            |----------------|-------|----------|--------|----------|----------|----------|
         | 
| 122 | 
            +
            | #Paramete      | 0.6B  | 1B       | 1B     | 0.5B     | 0.5B     | 0.5B     |
         | 
| 123 | 
            +
            | #Precision     | BF16  | BF16     | BF16   | BF16     |Int4(Fake)|Int4(GPTQ)|
         | 
| 124 | 
            +
            | MMLU           | 42.95 | 46.89    | 41.64  | 55.55    | 55.46    | 53.93    |
         | 
| 125 | 
            +
            | CMMLU          | 42.05 | 23.73    | 25.09  | 65.22    | 63.91    | 63.73    |
         | 
| 126 | 
            +
            | Ceval          | 45.53 | 36.74    | 31.83  | 66.11    | 64.85    | 65.22    |
         | 
| 127 | 
            +
            | BBH            | 28.32 | 25.42    | 33.21  | 49.87    | 48.81    | 49.09    |
         | 
| 128 | 
            +
            | GSM8K          | 61.71 | 39.76    | 61.26  | 52.08    | 45.41    | 45.49    |
         | 
| 129 | 
            +
            | MBPP           | 47.86 | 47.47    | 59.92  | 59.14    | 55.64    | 55.25    |
         | 
| 130 | 
            +
            | AVERAGE        | 44.73 | 36.66    | 42.15  | 58.00    | 55.68    | 55.45    |
         | 
| 131 | 
            +
             | 
| 132 | 
            +
             | 
| 133 | 
            +
             | 
| 134 | 
            +
            ## Statement
         | 
| 135 | 
            +
            - As a language model, MiniCPM generates content by learning from a vast amount of text. 
         | 
| 136 | 
            +
            - However, it does not possess the ability to comprehend or express personal opinions or value judgments. 
         | 
| 137 | 
            +
            - Any content generated by MiniCPM does not represent the viewpoints or positions of the model developers. 
         | 
| 138 | 
            +
            - Therefore, when using content generated by MiniCPM, users should take full responsibility for evaluating and verifying it on their own.
         | 
| 139 | 
            +
             | 
| 140 | 
            +
            ## LICENSE
         | 
| 141 | 
            +
            - This repository and MiniCPM models are released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License. 
         | 
| 142 | 
            +
             | 
| 143 | 
            +
            ## Citation
         | 
| 144 | 
            +
            - Please cite our [paper](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf) if you find our work valuable.
         | 
| 145 | 
            +
             | 
| 146 | 
            +
            ```bibtex
         | 
| 147 | 
            +
            @article{minicpm4,
         | 
| 148 | 
            +
              title={{MiniCPM4}: Ultra-Efficient LLMs on End Devices},
         | 
| 149 | 
            +
              author={MiniCPM Team},
         | 
| 150 | 
            +
              year={2025}
         | 
| 151 | 
            +
            }
         | 
| 152 | 
            +
            ```
         | 
    	
        added_tokens.json
    ADDED
    
    | @@ -0,0 +1,10 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "<|execute_end|>": 73444,
         | 
| 3 | 
            +
              "<|execute_start|>": 73443,
         | 
| 4 | 
            +
              "<|fim_middle|>": 73446,
         | 
| 5 | 
            +
              "<|fim_prefix|>": 73445,
         | 
| 6 | 
            +
              "<|fim_suffix|>": 73447,
         | 
| 7 | 
            +
              "<|im_end|>": 73440,
         | 
| 8 | 
            +
              "<|im_start|>": 73441,
         | 
| 9 | 
            +
              "<|tool_call|>": 73442
         | 
| 10 | 
            +
            }
         | 
    	
        config.json
    ADDED
    
    | @@ -0,0 +1,37 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
                "_name_or_path": "openbmb/MiniCPM4-0.5B-QAT-Int4-unquantized",
         | 
| 3 | 
            +
                "architectures": [
         | 
| 4 | 
            +
                    "MiniCPMForCausalLM"
         | 
| 5 | 
            +
                ],
         | 
| 6 | 
            +
                "auto_map": {
         | 
| 7 | 
            +
                    "AutoConfig": "configuration_minicpm.MiniCPMConfig",
         | 
| 8 | 
            +
                    "AutoModel": "modeling_minicpm.MiniCPMModel",
         | 
| 9 | 
            +
                    "AutoModelForCausalLM": "modeling_minicpm.MiniCPMForCausalLM",
         | 
| 10 | 
            +
                    "AutoModelForSeq2SeqLM": "modeling_minicpm.MiniCPMForCausalLM",
         | 
| 11 | 
            +
                    "AutoModelForSequenceClassification": "modeling_minicpm.MiniCPMForSequenceClassification"
         | 
| 12 | 
            +
                },
         | 
| 13 | 
            +
                "bos_token_id": 1,
         | 
| 14 | 
            +
                "eos_token_id": [2, 73440],
         | 
| 15 | 
            +
                "hidden_act": "silu",
         | 
| 16 | 
            +
                "hidden_size": 1024,
         | 
| 17 | 
            +
                "initializer_range": 0.1,
         | 
| 18 | 
            +
                "intermediate_size": 4096,
         | 
| 19 | 
            +
                "max_position_embeddings": 32768,
         | 
| 20 | 
            +
                "num_attention_heads": 16,
         | 
| 21 | 
            +
                "num_hidden_layers": 24,
         | 
| 22 | 
            +
                "num_key_value_heads": 2,
         | 
| 23 | 
            +
                "rms_norm_eps": 1e-05,
         | 
| 24 | 
            +
                "rope_scaling": {
         | 
| 25 | 
            +
                    "rope_type": "longrope", 
         | 
| 26 | 
            +
                    "long_factor": [1.0004360675811768, 1.0668443441390991, 1.1631425619125366, 1.3025742769241333, 1.5040205717086792, 1.7941505908966064, 2.2101221084594727, 2.802666664123535, 3.6389970779418945, 4.804192543029785, 6.39855432510376, 8.527148246765137, 11.277542114257812, 14.684998512268066, 18.69317054748535, 23.13019371032715, 27.72362518310547, 32.1606559753418, 36.168827056884766, 39.57627868652344, 42.32667541503906, 44.45526885986328, 46.04962921142578, 47.21482849121094, 48.05115509033203, 48.64370346069336, 49.05967712402344, 49.34980392456055, 49.551246643066406, 49.69068145751953, 49.78697967529297, 49.85338592529297],
         | 
| 27 | 
            +
                    "short_factor": [1.0004360675811768, 1.0668443441390991, 1.1631425619125366, 1.3025742769241333, 1.5040205717086792, 1.7941505908966064, 2.2101221084594727, 2.802666664123535, 3.6389970779418945, 4.804192543029785, 6.39855432510376, 8.527148246765137, 11.277542114257812, 14.684998512268066, 18.69317054748535, 23.13019371032715, 27.72362518310547, 32.1606559753418, 36.168827056884766, 39.57627868652344, 42.32667541503906, 44.45526885986328, 46.04962921142578, 47.21482849121094, 48.05115509033203, 48.64370346069336, 49.05967712402344, 49.34980392456055, 49.551246643066406, 49.69068145751953, 49.78697967529297, 49.85338592529297],
         | 
| 28 | 
            +
                    "original_max_position_embeddings": 32768
         | 
| 29 | 
            +
                },    
         | 
| 30 | 
            +
                "torch_dtype": "bfloat16",
         | 
| 31 | 
            +
                "transformers_version": "4.46.3",
         | 
| 32 | 
            +
                "use_cache": true,
         | 
| 33 | 
            +
                "vocab_size": 73448,
         | 
| 34 | 
            +
                "scale_emb": 12,
         | 
| 35 | 
            +
                "dim_model_base": 256,
         | 
| 36 | 
            +
                "scale_depth": 1.4
         | 
| 37 | 
            +
            }
         | 
    	
        configuration_minicpm.py
    ADDED
    
    | @@ -0,0 +1,207 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            # Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.
         | 
| 2 | 
            +
            #
         | 
| 3 | 
            +
            # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
         | 
| 4 | 
            +
            # and OPT implementations in this library. It has been modified from its
         | 
| 5 | 
            +
            # original forms to accommodate minor architectural differences compared
         | 
| 6 | 
            +
            # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
         | 
| 7 | 
            +
            #
         | 
| 8 | 
            +
            # Licensed under the Apache License, Version 2.0 (the "License");
         | 
| 9 | 
            +
            # you may not use this file except in compliance with the License.
         | 
| 10 | 
            +
            # You may obtain a copy of the License at
         | 
| 11 | 
            +
            #
         | 
| 12 | 
            +
            #     http://www.apache.org/licenses/LICENSE-2.0
         | 
| 13 | 
            +
            #
         | 
| 14 | 
            +
            # Unless required by applicable law or agreed to in writing, software
         | 
| 15 | 
            +
            # distributed under the License is distributed on an "AS IS" BASIS,
         | 
| 16 | 
            +
            # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
         | 
| 17 | 
            +
            # See the License for the specific language governing permissions and
         | 
| 18 | 
            +
            # limitations under the License.
         | 
| 19 | 
            +
            """ MiniCPM model configuration"""
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            from transformers.configuration_utils import PretrainedConfig
         | 
| 22 | 
            +
            from transformers.utils import logging
         | 
| 23 | 
            +
             | 
| 24 | 
            +
            logger = logging.get_logger(__name__)
         | 
| 25 | 
            +
             | 
| 26 | 
            +
            MINICPM_PRETRAINED_CONFIG_ARCHIVE_MAP = {}
         | 
| 27 | 
            +
             | 
| 28 | 
            +
             | 
| 29 | 
            +
            class MiniCPMConfig(PretrainedConfig):
         | 
| 30 | 
            +
                r"""
         | 
| 31 | 
            +
                This is the configuration class to store the configuration of a [`MiniCPMModel`]. It is used to instantiate an MiniCPM
         | 
| 32 | 
            +
                model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
         | 
| 33 | 
            +
                defaults will yield a similar configuration to that of the MiniCPM-7B.
         | 
| 34 | 
            +
             | 
| 35 | 
            +
                Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
         | 
| 36 | 
            +
                documentation from [`PretrainedConfig`] for more information.
         | 
| 37 | 
            +
             | 
| 38 | 
            +
             | 
| 39 | 
            +
                Args:
         | 
| 40 | 
            +
                    vocab_size (`int`, *optional*, defaults to 32000):
         | 
| 41 | 
            +
                        Vocabulary size of the MiniCPM model. Defines the number of different tokens that can be represented by the
         | 
| 42 | 
            +
                        `inputs_ids` passed when calling [`MiniCPMModel`]
         | 
| 43 | 
            +
                    hidden_size (`int`, *optional*, defaults to 4096):
         | 
| 44 | 
            +
                        Dimension of the hidden representations.
         | 
| 45 | 
            +
                    intermediate_size (`int`, *optional*, defaults to 11008):
         | 
| 46 | 
            +
                        Dimension of the MLP representations.
         | 
| 47 | 
            +
                    num_hidden_layers (`int`, *optional*, defaults to 32):
         | 
| 48 | 
            +
                        Number of hidden layers in the Transformer decoder.
         | 
| 49 | 
            +
                    num_attention_heads (`int`, *optional*, defaults to 32):
         | 
| 50 | 
            +
                        Number of attention heads for each attention layer in the Transformer decoder.
         | 
| 51 | 
            +
                    num_key_value_heads (`int`, *optional*):
         | 
| 52 | 
            +
                        This is the number of key_value heads that should be used to implement Grouped Query Attention. If
         | 
| 53 | 
            +
                        `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
         | 
| 54 | 
            +
                        `num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
         | 
| 55 | 
            +
                        converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
         | 
| 56 | 
            +
                        by meanpooling all the original heads within that group. For more details checkout [this
         | 
| 57 | 
            +
                        paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
         | 
| 58 | 
            +
                        `num_attention_heads`.
         | 
| 59 | 
            +
                    hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
         | 
| 60 | 
            +
                        The non-linear activation function (function or string) in the decoder.
         | 
| 61 | 
            +
                    max_position_embeddings (`int`, *optional*, defaults to 2048):
         | 
| 62 | 
            +
                        The maximum sequence length that this model might ever be used with. MiniCPM 1 supports up to 2048 tokens,
         | 
| 63 | 
            +
                        MiniCPM 2 up to 4096, CodeMiniCPM up to 16384.
         | 
| 64 | 
            +
                    initializer_range (`float`, *optional*, defaults to 0.02):
         | 
| 65 | 
            +
                        The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
         | 
| 66 | 
            +
                    rms_norm_eps (`float`, *optional*, defaults to 1e-06):
         | 
| 67 | 
            +
                        The epsilon used by the rms normalization layers.
         | 
| 68 | 
            +
                    use_cache (`bool`, *optional*, defaults to `True`):
         | 
| 69 | 
            +
                        Whether or not the model should return the last key/values attentions (not used by all models). Only
         | 
| 70 | 
            +
                        relevant if `config.is_decoder=True`.
         | 
| 71 | 
            +
                    pad_token_id (`int`, *optional*):
         | 
| 72 | 
            +
                        Padding token id.
         | 
| 73 | 
            +
                    bos_token_id (`int`, *optional*, defaults to 1):
         | 
| 74 | 
            +
                        Beginning of stream token id.
         | 
| 75 | 
            +
                    eos_token_id (`int`, *optional*, defaults to 2):
         | 
| 76 | 
            +
                        End of stream token id.
         | 
| 77 | 
            +
                    pretraining_tp (`int`, *optional*, defaults to 1):
         | 
| 78 | 
            +
                        Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this
         | 
| 79 | 
            +
                        document](https://huggingface.co/docs/transformers/parallelism) to understand more about it. This value is
         | 
| 80 | 
            +
                        necessary to ensure exact reproducibility of the pretraining results. Please refer to [this
         | 
| 81 | 
            +
                        issue](https://github.com/pytorch/pytorch/issues/76232).
         | 
| 82 | 
            +
                    tie_word_embeddings (`bool`, *optional*, defaults to `False`):
         | 
| 83 | 
            +
                        Whether to tie weight embeddings
         | 
| 84 | 
            +
                    rope_theta (`float`, *optional*, defaults to 10000.0):
         | 
| 85 | 
            +
                        The base period of the RoPE embeddings.
         | 
| 86 | 
            +
                    rope_scaling (`Dict`, *optional*):
         | 
| 87 | 
            +
                        Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
         | 
| 88 | 
            +
                        strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
         | 
| 89 | 
            +
                        `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
         | 
| 90 | 
            +
                        `max_position_embeddings` to the expected new maximum. See the following thread for more information on how
         | 
| 91 | 
            +
                        these scaling strategies behave:
         | 
| 92 | 
            +
                        https://www.reddit.com/r/LocalMiniCPM/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
         | 
| 93 | 
            +
                        experimental feature, subject to breaking API changes in future versions.
         | 
| 94 | 
            +
                    attention_bias (`bool`, defaults to `False`, *optional*, defaults to `False`):
         | 
| 95 | 
            +
                        Whether to use a bias in the query, key, value and output projection layers during self-attention.
         | 
| 96 | 
            +
                    attention_dropout (`float`, *optional*, defaults to 0.0):
         | 
| 97 | 
            +
                        The dropout ratio for the attention probabilities.
         | 
| 98 | 
            +
             | 
| 99 | 
            +
                ```python
         | 
| 100 | 
            +
                >>> from transformers import MiniCPMModel, MiniCPMConfig
         | 
| 101 | 
            +
             | 
| 102 | 
            +
                >>> # Initializing a MiniCPM minicpm-7b style configuration
         | 
| 103 | 
            +
                >>> configuration = MiniCPMConfig()
         | 
| 104 | 
            +
             | 
| 105 | 
            +
                >>> # Initializing a model from the minicpm-7b style configuration
         | 
| 106 | 
            +
                >>> model = MiniCPMModel(configuration)
         | 
| 107 | 
            +
             | 
| 108 | 
            +
                >>> # Accessing the model configuration
         | 
| 109 | 
            +
                >>> configuration = model.config
         | 
| 110 | 
            +
                ```"""
         | 
| 111 | 
            +
             | 
| 112 | 
            +
                model_type = 'minicpm'
         | 
| 113 | 
            +
                keys_to_ignore_at_inference = ['past_key_values']
         | 
| 114 | 
            +
             | 
| 115 | 
            +
                def __init__(
         | 
| 116 | 
            +
                    self,
         | 
| 117 | 
            +
                    vocab_size=32000,
         | 
| 118 | 
            +
                    hidden_size=4096,
         | 
| 119 | 
            +
                    intermediate_size=11008,
         | 
| 120 | 
            +
                    num_hidden_layers=32,
         | 
| 121 | 
            +
                    num_attention_heads=32,
         | 
| 122 | 
            +
                    num_key_value_heads=None,
         | 
| 123 | 
            +
                    hidden_act='silu',
         | 
| 124 | 
            +
                    max_position_embeddings=2048,
         | 
| 125 | 
            +
                    initializer_range=0.02,
         | 
| 126 | 
            +
                    rms_norm_eps=1e-6,
         | 
| 127 | 
            +
                    use_cache=True,
         | 
| 128 | 
            +
                    pad_token_id=None,
         | 
| 129 | 
            +
                    bos_token_id=1,
         | 
| 130 | 
            +
                    eos_token_id=2,
         | 
| 131 | 
            +
                    pretraining_tp=1,
         | 
| 132 | 
            +
                    tie_word_embeddings=True,
         | 
| 133 | 
            +
                    rope_theta=10000.0,
         | 
| 134 | 
            +
                    rope_scaling=None,
         | 
| 135 | 
            +
                    attention_bias=False,
         | 
| 136 | 
            +
                    attention_dropout=0.0,
         | 
| 137 | 
            +
                    scale_emb=1,
         | 
| 138 | 
            +
                    dim_model_base=1,
         | 
| 139 | 
            +
                    scale_depth=1,
         | 
| 140 | 
            +
                    mup_denominator=None,
         | 
| 141 | 
            +
                    sparse_config=None,
         | 
| 142 | 
            +
                    **kwargs):
         | 
| 143 | 
            +
             | 
| 144 | 
            +
                    self.vocab_size = vocab_size
         | 
| 145 | 
            +
                    self.max_position_embeddings = max_position_embeddings
         | 
| 146 | 
            +
                    self.hidden_size = hidden_size
         | 
| 147 | 
            +
                    self.intermediate_size = intermediate_size
         | 
| 148 | 
            +
                    self.num_hidden_layers = num_hidden_layers
         | 
| 149 | 
            +
                    self.num_attention_heads = num_attention_heads
         | 
| 150 | 
            +
             | 
| 151 | 
            +
                    # for backward compatibility
         | 
| 152 | 
            +
                    if num_key_value_heads is None:
         | 
| 153 | 
            +
                        num_key_value_heads = num_attention_heads
         | 
| 154 | 
            +
             | 
| 155 | 
            +
                    self.num_key_value_heads = num_key_value_heads
         | 
| 156 | 
            +
                    self.hidden_act = hidden_act
         | 
| 157 | 
            +
                    self.initializer_range = initializer_range
         | 
| 158 | 
            +
                    self.rms_norm_eps = rms_norm_eps
         | 
| 159 | 
            +
                    self.pretraining_tp = pretraining_tp
         | 
| 160 | 
            +
                    self.use_cache = use_cache
         | 
| 161 | 
            +
                    self.rope_theta = rope_theta
         | 
| 162 | 
            +
                    self.rope_scaling = rope_scaling
         | 
| 163 | 
            +
                    # self._rope_scaling_validation()
         | 
| 164 | 
            +
                    self.attention_bias = attention_bias
         | 
| 165 | 
            +
                    self.attention_dropout = attention_dropout
         | 
| 166 | 
            +
                    self.scale_emb = scale_emb
         | 
| 167 | 
            +
                    self.dim_model_base = dim_model_base
         | 
| 168 | 
            +
                    self.scale_depth = scale_depth
         | 
| 169 | 
            +
                    # only used for Eagle Head
         | 
| 170 | 
            +
                    self.mup_denominator = mup_denominator
         | 
| 171 | 
            +
             | 
| 172 | 
            +
                    # sparse config
         | 
| 173 | 
            +
                    self.sparse_config = sparse_config
         | 
| 174 | 
            +
             | 
| 175 | 
            +
                    super().__init__(
         | 
| 176 | 
            +
                        pad_token_id=pad_token_id,
         | 
| 177 | 
            +
                        bos_token_id=bos_token_id,
         | 
| 178 | 
            +
                        eos_token_id=eos_token_id,
         | 
| 179 | 
            +
                        tie_word_embeddings=tie_word_embeddings,
         | 
| 180 | 
            +
                        **kwargs,
         | 
| 181 | 
            +
                    )
         | 
| 182 | 
            +
                    try:
         | 
| 183 | 
            +
                        import flash_attn
         | 
| 184 | 
            +
                        self._attn_implementation = 'flash_attention_2'
         | 
| 185 | 
            +
                    except:
         | 
| 186 | 
            +
                        pass
         | 
| 187 | 
            +
             | 
| 188 | 
            +
                def _rope_scaling_validation(self):
         | 
| 189 | 
            +
                    """
         | 
| 190 | 
            +
                    Validate the `rope_scaling` configuration.
         | 
| 191 | 
            +
                    """
         | 
| 192 | 
            +
                    if self.rope_scaling is None:
         | 
| 193 | 
            +
                        return
         | 
| 194 | 
            +
             | 
| 195 | 
            +
                    if not isinstance(self.rope_scaling, dict) or len(self.rope_scaling) != 2:
         | 
| 196 | 
            +
                        raise ValueError(
         | 
| 197 | 
            +
                            '`rope_scaling` must be a dictionary with with two fields, `type` and `factor`, '
         | 
| 198 | 
            +
                            f'got {self.rope_scaling}'
         | 
| 199 | 
            +
                        )
         | 
| 200 | 
            +
                    rope_scaling_type = self.rope_scaling.get('type', None)
         | 
| 201 | 
            +
                    rope_scaling_factor = self.rope_scaling.get('factor', None)
         | 
| 202 | 
            +
                    if rope_scaling_type is None or rope_scaling_type not in ['linear', 'dynamic']:
         | 
| 203 | 
            +
                        raise ValueError(
         | 
| 204 | 
            +
                            f"`rope_scaling`'s type field must be one of ['linear', 'dynamic'], got {rope_scaling_type}"
         | 
| 205 | 
            +
                        )
         | 
| 206 | 
            +
                    if rope_scaling_factor is None or not isinstance(rope_scaling_factor, float) or rope_scaling_factor <= 1.0:
         | 
| 207 | 
            +
                        raise ValueError(f"`rope_scaling`'s factor field must be a float > 1, got {rope_scaling_factor}")
         | 
    	
        generation_config.json
    ADDED
    
    | @@ -0,0 +1,12 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "bos_token_id": 1,
         | 
| 3 | 
            +
              "do_sample": true,
         | 
| 4 | 
            +
              "eos_token_id": [
         | 
| 5 | 
            +
                2,
         | 
| 6 | 
            +
                73440
         | 
| 7 | 
            +
              ],
         | 
| 8 | 
            +
              "pad_token_id": 2,
         | 
| 9 | 
            +
              "temperature": 0.8,
         | 
| 10 | 
            +
              "top_p": 0.8,
         | 
| 11 | 
            +
              "transformers_version": "4.46.1"
         | 
| 12 | 
            +
            }
         | 
    	
        model.safetensors
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:c0fa5ac145cb76786bc274f65325695eabdd3ef6e718bac1080db1093f6af44b
         | 
| 3 | 
            +
            size 1735520504
         | 
    	
        modeling_minicpm.py
    ADDED
    
    | The diff for this file is too large to render. 
		See raw diff | 
|  | 
    	
        special_tokens_map.json
    ADDED
    
    | @@ -0,0 +1,33 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "additional_special_tokens": [
         | 
| 3 | 
            +
                "<|im_end|>",
         | 
| 4 | 
            +
                "<|im_start|>",
         | 
| 5 | 
            +
                "<|tool_call|>",
         | 
| 6 | 
            +
                "<|execute_start|>",
         | 
| 7 | 
            +
                "<|execute_end|>",
         | 
| 8 | 
            +
                "<|fim_prefix|>",
         | 
| 9 | 
            +
                "<|fim_middle|>",
         | 
| 10 | 
            +
                "<|fim_suffix|>"
         | 
| 11 | 
            +
              ],
         | 
| 12 | 
            +
              "bos_token": {
         | 
| 13 | 
            +
                "content": "<s>",
         | 
| 14 | 
            +
                "lstrip": false,
         | 
| 15 | 
            +
                "normalized": false,
         | 
| 16 | 
            +
                "rstrip": false,
         | 
| 17 | 
            +
                "single_word": false
         | 
| 18 | 
            +
              },
         | 
| 19 | 
            +
              "eos_token": {
         | 
| 20 | 
            +
                "content": "<|im_end|>",
         | 
| 21 | 
            +
                "lstrip": false,
         | 
| 22 | 
            +
                "normalized": false,
         | 
| 23 | 
            +
                "rstrip": false,
         | 
| 24 | 
            +
                "single_word": false
         | 
| 25 | 
            +
              },
         | 
| 26 | 
            +
              "unk_token": {
         | 
| 27 | 
            +
                "content": "<unk>",
         | 
| 28 | 
            +
                "lstrip": false,
         | 
| 29 | 
            +
                "normalized": false,
         | 
| 30 | 
            +
                "rstrip": false,
         | 
| 31 | 
            +
                "single_word": false
         | 
| 32 | 
            +
              }
         | 
| 33 | 
            +
            }
         | 
    	
        tokenizer.json
    ADDED
    
    | The diff for this file is too large to render. 
		See raw diff | 
|  | 
    	
        tokenizer.model
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:bb74d51116831c3bf65db812c553f94ab0c88dcf97a5bbb37e3504f6d359c530
         | 
| 3 | 
            +
            size 1181204
         | 
    	
        tokenizer_config.json
    ADDED
    
    | @@ -0,0 +1,117 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "add_bos_token": true,
         | 
| 3 | 
            +
              "add_eos_token": false,
         | 
| 4 | 
            +
              "add_prefix_space": null,
         | 
| 5 | 
            +
              "added_tokens_decoder": {
         | 
| 6 | 
            +
                "0": {
         | 
| 7 | 
            +
                  "content": "<unk>",
         | 
| 8 | 
            +
                  "lstrip": false,
         | 
| 9 | 
            +
                  "normalized": false,
         | 
| 10 | 
            +
                  "rstrip": false,
         | 
| 11 | 
            +
                  "single_word": false,
         | 
| 12 | 
            +
                  "special": true
         | 
| 13 | 
            +
                },
         | 
| 14 | 
            +
                "1": {
         | 
| 15 | 
            +
                  "content": "<s>",
         | 
| 16 | 
            +
                  "lstrip": false,
         | 
| 17 | 
            +
                  "normalized": false,
         | 
| 18 | 
            +
                  "rstrip": false,
         | 
| 19 | 
            +
                  "single_word": false,
         | 
| 20 | 
            +
                  "special": true
         | 
| 21 | 
            +
                },
         | 
| 22 | 
            +
                "2": {
         | 
| 23 | 
            +
                  "content": "</s>",
         | 
| 24 | 
            +
                  "lstrip": false,
         | 
| 25 | 
            +
                  "normalized": false,
         | 
| 26 | 
            +
                  "rstrip": false,
         | 
| 27 | 
            +
                  "single_word": false,
         | 
| 28 | 
            +
                  "special": true
         | 
| 29 | 
            +
                },
         | 
| 30 | 
            +
                "73440": {
         | 
| 31 | 
            +
                  "content": "<|im_end|>",
         | 
| 32 | 
            +
                  "lstrip": false,
         | 
| 33 | 
            +
                  "normalized": false,
         | 
| 34 | 
            +
                  "rstrip": false,
         | 
| 35 | 
            +
                  "single_word": false,
         | 
| 36 | 
            +
                  "special": true
         | 
| 37 | 
            +
                },
         | 
| 38 | 
            +
                "73441": {
         | 
| 39 | 
            +
                  "content": "<|im_start|>",
         | 
| 40 | 
            +
                  "lstrip": false,
         | 
| 41 | 
            +
                  "normalized": false,
         | 
| 42 | 
            +
                  "rstrip": false,
         | 
| 43 | 
            +
                  "single_word": false,
         | 
| 44 | 
            +
                  "special": true
         | 
| 45 | 
            +
                },
         | 
| 46 | 
            +
                "73442": {
         | 
| 47 | 
            +
                  "content": "<|tool_call|>",
         | 
| 48 | 
            +
                  "lstrip": false,
         | 
| 49 | 
            +
                  "normalized": false,
         | 
| 50 | 
            +
                  "rstrip": false,
         | 
| 51 | 
            +
                  "single_word": false,
         | 
| 52 | 
            +
                  "special": true
         | 
| 53 | 
            +
                },
         | 
| 54 | 
            +
                "73443": {
         | 
| 55 | 
            +
                  "content": "<|execute_start|>",
         | 
| 56 | 
            +
                  "lstrip": false,
         | 
| 57 | 
            +
                  "normalized": false,
         | 
| 58 | 
            +
                  "rstrip": false,
         | 
| 59 | 
            +
                  "single_word": false,
         | 
| 60 | 
            +
                  "special": true
         | 
| 61 | 
            +
                },
         | 
| 62 | 
            +
                "73444": {
         | 
| 63 | 
            +
                  "content": "<|execute_end|>",
         | 
| 64 | 
            +
                  "lstrip": false,
         | 
| 65 | 
            +
                  "normalized": false,
         | 
| 66 | 
            +
                  "rstrip": false,
         | 
| 67 | 
            +
                  "single_word": false,
         | 
| 68 | 
            +
                  "special": true
         | 
| 69 | 
            +
                },
         | 
| 70 | 
            +
                "73445": {
         | 
| 71 | 
            +
                  "content": "<|fim_prefix|>",
         | 
| 72 | 
            +
                  "lstrip": false,
         | 
| 73 | 
            +
                  "normalized": false,
         | 
| 74 | 
            +
                  "rstrip": false,
         | 
| 75 | 
            +
                  "single_word": false,
         | 
| 76 | 
            +
                  "special": true
         | 
| 77 | 
            +
                },
         | 
| 78 | 
            +
                "73446": {
         | 
| 79 | 
            +
                  "content": "<|fim_middle|>",
         | 
| 80 | 
            +
                  "lstrip": false,
         | 
| 81 | 
            +
                  "normalized": false,
         | 
| 82 | 
            +
                  "rstrip": false,
         | 
| 83 | 
            +
                  "single_word": false,
         | 
| 84 | 
            +
                  "special": true
         | 
| 85 | 
            +
                },
         | 
| 86 | 
            +
                "73447": {
         | 
| 87 | 
            +
                  "content": "<|fim_suffix|>",
         | 
| 88 | 
            +
                  "lstrip": false,
         | 
| 89 | 
            +
                  "normalized": false,
         | 
| 90 | 
            +
                  "rstrip": false,
         | 
| 91 | 
            +
                  "single_word": false,
         | 
| 92 | 
            +
                  "special": true
         | 
| 93 | 
            +
                }
         | 
| 94 | 
            +
              },
         | 
| 95 | 
            +
              "additional_special_tokens": [
         | 
| 96 | 
            +
                "<|im_end|>",
         | 
| 97 | 
            +
                "<|im_start|>",
         | 
| 98 | 
            +
                "<|tool_call|>",
         | 
| 99 | 
            +
                "<|execute_start|>",
         | 
| 100 | 
            +
                "<|execute_end|>",
         | 
| 101 | 
            +
                "<|fim_prefix|>",
         | 
| 102 | 
            +
                "<|fim_middle|>",
         | 
| 103 | 
            +
                "<|fim_suffix|>"
         | 
| 104 | 
            +
              ],
         | 
| 105 | 
            +
              "bos_token": "<s>",
         | 
| 106 | 
            +
              "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
         | 
| 107 | 
            +
              "clean_up_tokenization_spaces": false,
         | 
| 108 | 
            +
              "eos_token": "<|im_end|>",
         | 
| 109 | 
            +
              "legacy": true,
         | 
| 110 | 
            +
              "model_max_length": 1000000000000000019884624838656,
         | 
| 111 | 
            +
              "pad_token": null,
         | 
| 112 | 
            +
              "sp_model_kwargs": {},
         | 
| 113 | 
            +
              "spaces_between_special_tokens": false,
         | 
| 114 | 
            +
              "tokenizer_class": "LlamaTokenizer",
         | 
| 115 | 
            +
              "unk_token": "<unk>",
         | 
| 116 | 
            +
              "use_default_system_prompt": false
         | 
| 117 | 
            +
            }
         | 
