reece124 commited on 6 days ago

Commit

09b2bf2

verified ·

1 Parent(s): 48153fe

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +411 -0
config.bak.json +69 -0
config.json +51 -0
configuration_opencua.py +37 -0
generation_config.json +4 -0
model-1-of-64.safetensors +3 -0
model-10-of-64.safetensors +3 -0
model-11-of-64.safetensors +3 -0
model-12-of-64.safetensors +3 -0
model-13-of-64.safetensors +3 -0
model-14-of-64.safetensors +3 -0
model-15-of-64.safetensors +3 -0
model-16-of-64.safetensors +3 -0
model-17-of-64.safetensors +3 -0
model-18-of-64.safetensors +3 -0
model-19-of-64.safetensors +3 -0
model-2-of-64.safetensors +3 -0
model-20-of-64.safetensors +3 -0
model-21-of-64.safetensors +3 -0
model-22-of-64.safetensors +3 -0
model-23-of-64.safetensors +3 -0
model-24-of-64.safetensors +3 -0
model-25-of-64.safetensors +3 -0
model-26-of-64.safetensors +3 -0
model-27-of-64.safetensors +3 -0
model-28-of-64.safetensors +3 -0
model-29-of-64.safetensors +3 -0
model-3-of-64.safetensors +3 -0
model-30-of-64.safetensors +3 -0
model-31-of-64.safetensors +3 -0
model-32-of-64.safetensors +3 -0
model-33-of-64.safetensors +3 -0
model-34-of-64.safetensors +3 -0
model-35-of-64.safetensors +3 -0
model-36-of-64.safetensors +3 -0
model-37-of-64.safetensors +3 -0
model-38-of-64.safetensors +3 -0
model-39-of-64.safetensors +3 -0
model-4-of-64.safetensors +3 -0
model-40-of-64.safetensors +3 -0
model-41-of-64.safetensors +3 -0
model-42-of-64.safetensors +3 -0
model-43-of-64.safetensors +3 -0
model-44-of-64.safetensors +3 -0
model-45-of-64.safetensors +3 -0
model-46-of-64.safetensors +3 -0
model-47-of-64.safetensors +3 -0
model-48-of-64.safetensors +3 -0
model-49-of-64.safetensors +3 -0
model-5-of-64.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,411 @@

+---
+base_model:
+- Qwen/Qwen2.5-VL-32B-Instruct
+datasets:
+- xlangai/AgentNet
+- xlangai/aguvis-stage1
+- xlangai/aguvis-stage2
+- osunlp/UGround-V1-Data
+language:
+- en
+license: mit
+metrics:
+- code_eval
+- accuracy
+pipeline_tag: image-text-to-text
+tags:
+- VLM
+- Computer-Use-Agent
+- OS-Agent
+- GUI
+- Grounding
+library_name: transformers
+---
+<h1 style="
+  font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Helvetica,Arial,sans-serif;
+  font-size:48px;
+  font-weight:700;
+  line-height:1.25;
+  text-align:center;
+  margin:0 0 24px;">
+  OpenCUA: Open Foundations for Computer-Use Agents
+</h1>
+<div style="
+  display:flex;
+  justify-content:center;
+  gap:12px;
+  flex-wrap:wrap;
+  margin-bottom:28px;">
+  <a href="https://opencua.xlang.ai/" style="
+     display:inline-block;
+     padding:8px 24px;
+     background:#2b2b2b;
+     color:#ffffff;
+     border-radius:36px;
+     text-decoration:none;
+     font-weight:600;
+     font-size:16px;">
+    🌐 Website
+  </a>
+  <a href="https://arxiv.org/abs/2508.09123" style="
+     display:inline-block;
+     padding:8px 24px;
+     background:#2b2b2b;
+     color:#ffffff;
+     border-radius:36px;
+     text-decoration:none;
+     font-weight:600;
+     font-size:16px;">
+    📝 Paper
+  </a>
+  <a href="https://github.com/xlang-ai/OpenCUA" style="
+     display:inline-block;
+     padding:8px 24px;
+     background:#2b2b2b;
+     color:#ffffff;
+     border-radius:36px;
+     text-decoration:none;
+     font-weight:600;
+     font-size:16px;">
+    💻 Code
+  </a>
+</div>
+<div style="max-width:900px;margin:0 auto;">
+#  Introduction
+<div style="
+  max-width: 880px;              /* 可按需调节整体宽度 */
+  margin: 0 auto;               /* 居中容器 */
+  text-align: justify;          /* 关键：两端对齐 */
+  text-justify: inter-word;     /* 优化英文对齐效果 */
+  line-height: 1.6;">
+OpenCUA models (OpenCUA-7B and OpenCUA-32B) are end-to-end computer-use foundation models than can produce executable actions in the computer environments.  They are based on the weights of Qwen2.5-VL-7B-Instruction and Qwen2.5-VL-32B-Instruction.
+They demonstrate superior performance across CUA benchmarks. In particular, <b>OpenCUA-32B</b> achieves an average success rate of **34.8%** on [OSWorld-Verified](https://os-world.github.io/),
+establishing a new state-of-the-art (SOTA) among open-source models and surpassing OpenAI CUA (GPT-4o). Both models also have strong grounding performance, OpenCUA-32B achieves 59.6% on [OSWorld-G](https://osworld-grounding.github.io/) and 55.3% on [Screenspot-Pro](https://arxiv.org/abs/2504.07981).
+</div>
+### Key Features
+- **Superior Computer-Use Capablity**: Able to execute multi-step computer-use actions with effective planning and reasoning
+- **Multi-OS Support**: Trained on demonstrations across Ubuntu, Windows, and macOS
+- **Visual Grounding**: Strong GUI element recognition and spatial reasoning capabilities
+- **Multi-Image Context**: Processes up to 3 screenshot history for better context understanding
+- **Reflective Reasoning**: Enhanced with reflective long Chain-of-Thought that identifies errors and provides corrective reasoning
+# Performance
+### Online Agent Evaluation
+OpenCUA models achieves strong performance on **[OSWorld-Verified](https://os-world.github.io/)**.
+OPENCUA-32B achieves the best performance among all open-source models with an average success rate of 34.8%, outperforming prior baselines by large margins.
+It also closes the gap to proprietary Claude models.
+<div align="center">
+| **Model**                        | **15 Steps** | **50 Steps** | **100 Steps** |
+|-------------------------------|:--------:|:--------:|:---------:|
+| **Proprietary**               |          |          |           |
+| OpenAI CUA                    | 26.0     | 31.3     | 31.4      |
+| Seed 1.5-VL                   | 27.9     | —        | 34.1      |
+| Claude 3.7 Sonnet             | 27.1     | 35.8     | 35.9      |
+| Claude 4 Sonnet               | 31.2     | 43.9     | 41.5      |
+| **Open-Source**               |          |          |           |
+| Qwen 2.5-VL-32B-Instruct       | 3.0      | —        | 3.9       |
+| Qwen 2.5-VL-72B-Instruct       | 4.4      | —        | 5.0       |
+| Kimi-VL-A3B                   | 9.7      | —        | 10.3      |
+| UI-TARS-72B-DPO               | 24.0     | 25.8     | 27.1      |
+| UI-TARS-1.5-7B                | 24.5     | 27.3     | 27.4      |
+| OpenCUA-7B *(Ours)*           | 24.3     | 27.9     | 26.6      |
+| **OpenCUA-32B *(Ours)***      | **29.7** | **34.1** | **34.8**  |
+</div>
+*OpenCUA scores are the mean of 3 independent runs.*
+### GUI Grounding Performance
+<div align="center">
+| **Model** | **OSWorld-G** | **ScreenSpot-V2** | **ScreenSpot-Pro** |
+|-------|-----------|---------------|----------------|
+| Qwen2.5-VL-7B | 31.4 | 88.8 | 27.6 |
+| Qwen2.5-VL-32B | 46.5 | 87.0 | 39.4 |
+| UI-TARS-72B | 57.1 | 90.3 | 38.1 |
+| **OpenCUA-A3B** | 48.6 | 91.4 | 28.5 |
+| **OpenCUA-Qwen2-7B** | 45.7 | 88.5 | 23.7 |
+| **OpenCUA-7B** | 55.3 | 92.3 | 50.0 |
+| **OpenCUA-32B** | **59.6** | **93.4** | **55.3** |
+</div>
+### AgentNetBench (Offline Evaluation)
+<div align="center">
+| **Model** | **Coordinate Actions** | **Content Actions** | **Function Actions** | **Average** |
+|-------|-------------------|-----------------|------------------|---------|
+| Qwen2.5-VL-7B | 50.7 | 40.8 | 3.1 | 48.0 |
+| Qwen2.5-VL-32B | 66.6 | 47.2 | 41.5 | 64.8 |
+| Qwen2.5-VL-72B | 67.2 | 52.6 | 50.5 | 67.0 |
+| OpenAI CUA          | 71.7 | 57.3 | **80.0** | 73.1 |
+| **OpenCUA-7B**  | 79.0 | 62.0 | 44.3 | 75.2 |
+| **OpenCUA-32B** | **81.9** | 66.1 | 55.7 | **79.1** |
+</div>
+#  🚀 Quick Start
+<div style="border-left: 6px solid #f28c28; background: #fff8e6; padding: 12px 16px; margin: 16px 0;">
+  <strong>⚠️ Important for Qwen-based Models (OpenCUA-7B, OpenCUA-32B):</strong>
+  To align with our training infrastructure, we have modified the model in two places:
+  <ul style="margin-top: 8px;">
+    <li>1. Multimodal Rotary Position Embedding (M-RoPE) has been replaced with 1D RoPE</strong>.</li>
+    <li>2. Using the same Tokenizer and ChatTemplate as Kimi-VL.</li>
+    <li>Do not use the default transformers and vllm classes to load the model. Tokenizer and Chat Template should be aligned if training the models.</li>
+  </ul>
+</div>
+## Installation & Download
+First, install the required transformers dependencies:
+```bash
+conda create -n opencua python=3.10
+conda activate opencua
+pip install -r requirement.txt
+```
+Download the model weight from huggingface:
+```bash
+from huggingface_hub import snapshot_download
+snapshot_download(
+    repo_id="xlangai/OpenCUA-32B",
+    local_dir="OpenCUA-32B",
+    local_dir_use_symlinks=False
+)
+```
+## 🎯 GUI Grounding
+The following code demonstrates how to use OpenCUA models for GUI grounding tasks:
+```python
+import base64
+import torch
+from transformers import AutoTokenizer, AutoModel, AutoImageProcessor
+from PIL import Image
+import json
+def encode_image(image_path: str) -> str:
+    """Encode image to base64 string for model input."""
+    with open(image_path, "rb") as f:
+        return base64.b64encode(f.read()).decode()
+def load_opencua_model(model_path: str):
+    """Load OpenCUA model, tokenizer, and image processor."""
+    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+    model = AutoModel.from_pretrained(
+        model_path,
+        torch_dtype="auto",
+        device_map="auto",
+        trust_remote_code=True
+    )
+    image_processor = AutoImageProcessor.from_pretrained(model_path, trust_remote_code=True)
+    return model, tokenizer, image_processor
+def create_grounding_messages(image_path: str, instruction: str):
+    """Create chat messages for GUI grounding task."""
+    system_prompt = (
+        "You are a GUI agent. You are given a task and a screenshot of the screen. "
+        "You need to perform a series of pyautogui actions to complete the task."
+    )
+    messages = [
+        {"role": "system", "content": system_prompt},
+        {
+            "role": "user",
+            "content": [
+                {"type": "image", "image": f"data:image/png;base64,{encode_image(image_path)}"},
+                {"type": "text", "text": instruction},
+            ],
+        },
+    ]
+    return messages
+def run_inference(model, tokenizer, image_processor, messages, image_path):
+    """Run inference on the model."""
+    # Prepare text input
+    input_ids = tokenizer.apply_chat_template(
+        messages, tokenize=True, add_generation_prompt=True
+    )
+    input_ids = torch.tensor([input_ids]).to(model.device)
+    # Prepare image input
+    image = Image.open(image_path).convert('RGB')
+    image_info = image_processor.preprocess(images=[image])
+    pixel_values = torch.tensor(image_info['pixel_values']).to(
+        dtype=torch.bfloat16, device=model.device
+    )
+    grid_thws = torch.tensor(image_info['image_grid_thw'])
+    # Generate response
+    with torch.no_grad():
+        generated_ids = model.generate(
+            input_ids,
+            pixel_values=pixel_values,
+            grid_thws=grid_thws,
+            max_new_tokens=512,
+            temperature=0
+        )
+    # Decode output
+    prompt_len = input_ids.shape[1]
+    generated_ids = generated_ids[:, prompt_len:]
+    output_text = tokenizer.batch_decode(
+        generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
+    )[0]
+    return output_text
+# Example usage
+model_path = "xlangai/OpenCUA-32B"  # or other model variants
+image_path = "screenshot.png"
+instruction = "Click on the submit button"
+# Load model
+model, tokenizer, image_processor = load_opencua_model(model_path)
+# Create messages and run inference
+messages = create_grounding_messages(image_path, instruction)
+result = run_inference(model, tokenizer, image_processor, messages, image_path)
+print("Model output:", result)
+```
+<div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">
+  <em>Expected result: ```python
+pyautogui.click(x=1432, y=344)
+```</em>
+</div>
+## 🖥️ Computer Use Agent
+**[OpenCUAAgent](https://github.com/xlang-ai/OSWorld/blob/main/mm_agents/opencua_agent.py)** is developed in the [OSWorld](https://github.com/xlang-ai/OSWorld) environment based on OpenCUA models. It iteratively perceives the environment via screenshots, produces reflective long CoT as inner monologue, and predicts the next action to be executed. OpenCUAAgent uses 3 images in total and L2 CoT format in default.
+Command for running OpenCUA-7B and OpenCUA-32B in OSWorld:
+```
+    python run_multienv_opencua.py \
+        --headless \
+        --observation_type screenshot \
+        --model OpenCUA-32B \
+        --result_dir ./results --test_all_meta_path evaluation_examples/test_all_no_gdrive.json \
+        --max_steps 100 \
+        --num_envs 30  \
+        --coordinate_type qwen25
+```
+<div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">
+  <em>Currently we only supports huggingface inference. We are implementing the vLLM supports of OpenCUA models. Please stay tuned.</em>
+</div>
+## Important Notes on Coordinate Systems
+<div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">
+  <ul style="margin: 0;">
+    <li><strong><code>xlangai/OpenCUA-A3B</code></strong> – Relative coordinates <em>(not supported in this code)</em></li>
+    <li><strong><code>xlangai/OpenCUA-Qwen2-7B</code></strong> – Relative coordinates</li>
+    <li><strong><code>xlangai/OpenCUA-7B</code></strong> – Absolute coordinates</li>
+    <li><strong><code>xlangai/OpenCUA-32B</code></strong> – Absolute coordinates</li>
+  </ul>
+</div>
+**OpenCUA models use different coordinate systems depending on the base model:**
+- **OpenCUA-Qwen2-7B**: Outputs **relative coordinates** (0.0 to 1.0 range)
+  ```python
+  # Example output: pyautogui.click(x=0.5, y=0.3)
+  # x=0.5 means 50% from left edge, y=0.3 means 30% from top edge
+  # Convert to absolute coordinates:
+  def qwen2_relative_to_absolute(rel_x, rel_y, original_width, original_height):
+      abs_x = int(rel_x * original_width)
+      abs_y = int(rel_y * original_height)
+      return abs_x, abs_y
+  ```
+- **OpenCUA-7B and OpenCUA-32B** (Qwen2.5-based): Output **absolute coordinates** after smart resize
+  ```python
+  # Example output: pyautogui.click(x=960, y=324)
+  # These are coordinates on the smart-resized image, not the original image
+  # Convert to original image coordinates:
+  # Please refer to the smart_resize function in: https://github.com/huggingface/transformers/blob/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39/src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py#L55
+  def qwen25_smart_resize_to_absolute(model_x, model_y, original_width, original_height):
+      # First, calculate the smart-resized dimensions
+      resized_height, resized_width = smart_resize(original_height, original_width, factor = 28, min_pixels = 3136, max_pixels = 12845056)
+      # Convert model output to relative coordinates on original image
+      rel_x = model_x / resized_width
+      rel_y = model_y / resized_height
+      # Then convert to absolute coordinates on original image
+      abs_x = int(rel_x * original_width)
+      abs_y = int(rel_y * original_height)
+      return abs_x, abs_y
+  ```
+<div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">
+  <strong>Understanding Smart Resize for Qwen2.5-based Models:</strong>
+  <p style="margin: 8px 0 0;">
+    The Qwen2.5-VL models use a “smart resize” preprocessing that maintains aspect ratio while fitting within pixel constraints.
+    For coordinate conversion, you need the smart resize function from the
+    <a href="https://github.com/QwenLM/Qwen2.5-VL/blob/d2240f11656bfe404b9ba56db4e51cd09f522ff1/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L60">
+      official Qwen2.5-VL implementation</a>.
+  </p>
+</div>
+# TODO
+## vLLM Support
+We are actively working with the vLLM team to add support for OpenCUA models.
+**Workaround:** For now, please use the standard transformers library as shown in the examples above. We will update this section once vLLM support becomes available.
+## Training Code
+OpenCUA models are developed based on the training infrastructure of Kimi Team. We are developting the training pipeline based on the open-source infrastructure as well.
+## License
+This project is licensed under the MIT License - see the LICENSE file in the root folder for details.
+## Research Use and Disclaimer
+OpenCUA models are intended for **research and educational purposes only**.
+### Prohibited Uses
+- The model may **not** be used for any purpose or activity that violates applicable laws or regulations in any jurisdiction
+- Use for illegal, unethical, or harmful activities is strictly prohibited
+### Disclaimer
+- The authors, contributors, and copyright holders are **not responsible** for any illegal, unethical, or harmful use of the Software, nor for any direct or indirect damages resulting from such use
+- Use of the "OpenCUA" name, logo, or trademarks does **not** imply any endorsement or affiliation unless separate written permission is obtained
+- Users are solely responsible for ensuring their use complies with applicable laws and regulations
+## Citation
+If you use OpenCUA models in your research, please cite our work:
+```bibtex
+@misc{wang2025opencuaopenfoundationscomputeruse,
+      title={OpenCUA: Open Foundations for Computer-Use Agents},
+      author={Xinyuan Wang and Bowen Wang and Dunjie Lu and Junlin Yang and Tianbao Xie and Junli Wang and Jiaqi Deng and Xiaole Guo and Yiheng Xu and Chen Henry Wu and Zhennan Shen and Zhuokai Li and Ryan Li and Xiaochuan Li and Junda Chen and Boyuan Zheng and Peihang Li and Fangyu Lei and Ruisheng Cao and Yeqiao Fu and Dongchan Shin and Martin Shin and Jiarui Hu and Yuyan Wang and Jixuan Chen and Yuxiao Ye and Danyang Zhang and Dikang Du and Hao Hu and Huarong Chen and Zaida Zhou and Haotian Yao and Ziwei Chen and Qizheng Gu and Yipu Wang and Heng Wang and Diyi Yang and Victor Zhong and Flood Sung and Y. Charles and Zhilin Yang and Tao Yu},
+      year={2025},
+      eprint={2508.09123},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/2508.09123},
+}
+```
+</div>

config.bak.json ADDED Viewed

	@@ -0,0 +1,69 @@

+{
+  "architectures": [
+    "OpenCUAForConditionalGeneration"
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_opencua.OpenCUAConfig",
+    "AutoModel": "modeling_opencua.OpenCUAForConditionalGeneration",
+    "AutoModelForCausalLM": "modeling_opencua.OpenCUAForConditionalGeneration"
+  },
+  "ignore_index": -100,
+  "media_placeholder_token_id": 151664,
+  "model_type": "opencua",
+  "pad_token_id": 0,
+  "text_config": {
+    "bos_token_id": 151643,
+    "eos_token_id": 151644,
+    "head_dim": 128,
+    "hidden_act": "silu",
+    "hidden_size": 5120,
+    "initializer_range": 0.02,
+    "intermediate_size": 27648,
+    "k_proj_bias": true,
+    "max_length": 20,
+    "min_length": 0,
+    "model_type": "qwen2",
+    "num_attention_heads": 40,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_hidden_layers": 64,
+    "num_key_value_heads": 8,
+    "pad_token_id": 152063,
+    "pretraining_sequence_length": 131072,
+    "q_proj_bias": true,
+    "rms_norm_eps": 1e-05,
+    "rope_theta": 1000000.0,
+    "tie_word_embeddings": false,
+    "torch_dtype": "bfloat16",
+    "use_bfloat16": false,
+    "use_cache": true,
+    "v_proj_bias": true,
+    "vocab_size": 152064
+  },
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.48.3",
+  "vision_config": {
+    "depth": 32,
+    "fullatt_block_indexes": [
+      7,
+      15,
+      23,
+      31
+    ],
+    "hidden_act": "silu",
+    "hidden_size": 1280,
+    "num_heads": 16,
+    "in_chans": 3,
+    "intermediate_size": 3456,
+    "patch_size": 14,
+    "spatial_merge_size": 2,
+    "spatial_patch_size": 14,
+    "temporal_patch_size": 2,
+    "out_hidden_size": 5120,
+    "tokens_per_second": 2,
+    "window_size": 112
+  },
+  "vocab_size": 152064
+}

config.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "architectures": [
+    "Qwen2_5_VLForConditionalGeneration"
+  ],
+  "attention_dropout": 0.0,
+  "eos_token_id": 151645,
+  "hidden_act": "silu",
+  "hidden_size": 5120,
+  "image_token_id": 151655,
+  "initializer_range": 0.02,
+  "intermediate_size": 27648,
+  "max_position_embeddings": 128000,
+  "max_window_layers": 64,
+  "model_type": "qwen2_5_vl",
+  "num_attention_heads": 40,
+  "num_hidden_layers": 64,
+  "num_key_value_heads": 8,
+  "pad_token_id": 151643,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": {
+    "mrope_section": [
+      16,
+      24,
+      24
+    ],
+    "rope_type": "default",
+    "type": "default"
+  },
+  "rope_theta": 1000000.0,
+  "sliding_window": 32768,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.49.0",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "video_token_id": 151656,
+  "vision_config": {
+    "hidden_size": 1280,
+    "in_chans": 3,
+    "intermediate_size": 3456,
+    "model_type": "qwen2_5_vl",
+    "out_hidden_size": 5120,
+    "spatial_patch_size": 14,
+    "tokens_per_second": 2,
+    "torch_dtype": "bfloat16"
+  },
+  "vision_end_token_id": 151653,
+  "vision_start_token_id": 151652,
+  "vision_token_id": 151654,
+  "vocab_size": 152064
+}

configuration_opencua.py ADDED Viewed

	@@ -0,0 +1,37 @@

+from transformers.configuration_utils import PretrainedConfig
+from transformers.models.qwen2_5_vl.configuration_qwen2_5_vl import Qwen2_5_VLVisionConfig
+from transformers.models.qwen2.configuration_qwen2 import Qwen2Config
+class OpenCUAConfig(PretrainedConfig):
+    """OpenCUA-2.5-32B model configuration.
+    Args:
+        vision_config: Configuration for the vision model.Qwen2_5_VLVisionConfig
+        text_config: Configuration for the text model. Qwen2Config
+        pad_token_id: The token ID to use for padding.
+    """
+    model_type = "opencua"
+    def __init__(
+        self,
+        vision_config: dict | Qwen2_5_VLVisionConfig | None = None,
+        text_config: dict | Qwen2Config | None = None,
+        ignore_index: int = -100,
+        media_placeholder_token_id: int = 151664,
+        pad_token_id: int = 0,
+        **kwargs
+    ):
+        if isinstance(vision_config, dict):
+            vision_config = Qwen2_5_VLVisionConfig(**vision_config)
+        self.vision_config = vision_config
+        if isinstance(text_config, dict):
+            text_config = Qwen2Config(**text_config)
+        self.text_config = text_config
+        self.ignore_index = ignore_index
+        self.media_placeholder_token_id = media_placeholder_token_id
+        super().__init__(pad_token_id=pad_token_id, **kwargs)

generation_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_length": 32768,
+  "eos_token_id": 151644
+}

model-1-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f93f8fdb8948cb1533461a48dbcc53ff3f49334fb4a9f39fba89b030b2671f2
+size 3910073936

model-10-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0dea6a376c59a96a85dbf170bee7709de7591fff8fe3dab573a189f003b8efbf
+size 975212080

model-11-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7305f64fbbd5fed1d919db106f720ea7ad4fdc0a3fbcedd53641bb3aa5cc0f31
+size 975212096

model-12-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa8097a66699924bbeeaef401c4494bc589b2f03656b631cd294722e1be0e56c
+size 975212096

model-13-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f62d3f60e4e9f5ced3999c7b59791d938f0624c6439d053f1010b3758833d692
+size 975212096

model-14-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:414abeb9ebc8b8d1556cc30654cdd89c29244fa598667d60b4c60acdc05febf1
+size 975212096

model-15-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:07e791bc6e5def800381c0f875e64fe44701ad1d223871929293cb402745abf2
+size 975212096

model-16-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dcfeec3a6e3d3761344d25bc7e64b470497f39b142abe76a27f0be328fa5a51b
+size 975212096

model-17-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d862948c63a9f309203fc804a6d57249dbf937cfd23cd470f4787ab1b6f408e
+size 975212096

model-18-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e593c5f9733724fe9e4e97737477425c7ebc4fe4ffc91848b28d2bb1372e725
+size 975212096

model-19-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2fd8ae7e12435ab7a71a933e7d7b4cb542f1d87d9dbba786ea5995b43164fa7c
+size 975212096

model-2-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ba11af2777844ed3e665df561e3505e8a3ed16a0c0f7c5167d30f9b08d0524ff
+size 975212080

model-20-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:20582ca438c35b1818cd372f6602b03bc98d606efb987022939982f9c21d8950
+size 975212096

model-21-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5a66d3be5340ce35964b0217a11ae461332bd1b61461714d7844a695e294bc79
+size 975212096

model-22-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05a397d86b3124248deb7da026325e94dfe5f9fc2c1afc393aec4101f404ae27
+size 975212096

model-23-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6a5a54d6f0094dff6c34de92caf1e962e36176585769148026efc51305b0060f
+size 975212096

model-24-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3adb06cbc432f29a0144373f44bada870e4b55a9e06d47593cdd69ad427bd7cc
+size 975212096

model-25-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6a9084a804f924021d66a35dc2c3fc35109d2ad7e99fbfdc0106ebe85602eac7
+size 975212096

model-26-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3563166a42ae3c8332ea79ef95aa17597d82bcb52a54695b3281e7e000b80af6
+size 975212096

model-27-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6b29bfc99c4cad7ca5e9b976734051b5b30181371e1a5e007e9dbde27f19e858
+size 975212096

model-28-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be52480d723a9b68bf1942cb074a5855456b8059ec5d22f99d12d0effc5b4547
+size 975212096

model-29-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3fb36bcb0a2acf028f3e83149d8954ec5f4b5c011b1ca538689e6dda034c298b
+size 975212096

model-3-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4a858641c4db4486f6142aaf5f7e6e2dba1fd14a7640c31ce9f86289f6cc9ec
+size 975212080

model-30-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:39cf268f7c72fc00bfa8b2a2ea6ab72243e46c88597e3c4aedb768288b534b05
+size 975212096

model-31-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4f54de1fae911dc809f8ca311ff948facdc4c613fc659468aadb6af99b58b125
+size 975212096

model-32-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:937716b987d5994fb46faecc35622ceb0fa18283aac2cbb506c4b9d3f1e3fcae
+size 975212096

model-33-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:08d4297513cb07588a018c0df47fd2849e30e64ebe61675b576dfef2a0a5c831
+size 975212096

model-34-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6651b5a783a8de35e606fe884b29c5a13ba237123972acbdae99fbe44614af57
+size 975212096

model-35-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bfeaf19d699c85bff1aa97b7a9106d2cecc4a6ecb0d6958250ee4b389c3b87f1
+size 975212096

model-36-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:439ea1bf04cc07f00d1e0a8ededd645e89e008fc8eec89c8213a82df7bc93442
+size 975212096

model-37-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:893ef546ef828989c6839b6b0ab02e6e6257bb4eef4ee9fd1deaa46202d62aa4
+size 975212096

model-38-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:447ecde3b04b45b54f6e24cfcd4dfa74cf68023c0f5a22eaaa230380d844dfdf
+size 975212096

model-39-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51186e7ff2c99616c75a85a28fbb37351e26c5dc2949d23d4373a3467910e935
+size 975212096

model-4-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d13cc0d8d8f37544d3bf0dbf039f0f94ef1114dbc53a2bb6679cf8c180b77bc
+size 975212080

model-40-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d1cbb025e2acb4527d4cae146bf3fc53200b5f714828ed452a7085cd40e5064
+size 975212096

model-41-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51d77fa16c2beed95ba7f5ad37a42e1258acda22640631f0ca0978db1b222672
+size 975212096

model-42-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e4f70cf44323490fc83788b6ebe670e247851d9c17a86dd1845e7377bf9691b
+size 975212096

model-43-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dcc0ef3ad628eb5c377f26caa4d1340c4b5e941cac9fdfad0e8826b8205a7520
+size 975212096

model-44-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a832b22641ce8b082cbf7f2abb3a6d12cee21211465ec1f6a6945e1b01ca3a08
+size 975212096

model-45-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f87e97668db5ff3af25b363da785365c5e8c061b0a69a899e148be7951200e3
+size 975212096

model-46-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:942391c6100fbfdad57e548f7d4e9789486a04ee016d6f3edea08d08cb0fa7c2
+size 975212096

model-47-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a04f304e9310861fa3bf4410475f60d8a02fac1ee27bfa4ffd93459f0340860d
+size 975212096

model-48-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fac24ae7395d74a349439772072bbc0bd900d68a8c68f42ee4fa7e5745731a7c
+size 975212096

model-49-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0b944ee340bf8ad7916ba6428ec6479c9d0bda3060fcb2b6a0fdacb6058c7023
+size 975212096

model-5-of-64.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d380966b60109b38e6a060423f347d9c90c84c1be995c1dfa534f412cdee9a65
+size 975212080