| # Wisent-Qwen2.5-Coder-7B-Instruct with CAA Steering | |
| ## Model Description | |
| This is an enhanced version of Qwen2.5-Coder-7B-Instruct that integrates **Contrastive Activation Addition (CAA)** steering directly into the model architecture. The steering parameters have been optimized using Optuna to improve code generation quality on the MBPP Plus benchmark. | |
| ### Key Features | |
| - 🚀 **Automatic CAA Steering**: No manual hook management required | |
| - 🎯 **Optimized Parameters**: Layer 24, α=0.9 | |
| - 🗂️ **Trait-Based Organization**: Steering vectors organized by traits | |
| - 🔧 **Runtime Configurable**: Adjust or disable steering on the fly | |
| - 🤗 **HuggingFace Compatible**: Works with standard transformers API | |
| ## Installation | |
| ```bash | |
| pip install transformers torch | |
| ``` | |
| ## Quick Start | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| # Load model - CAA steering is automatically applied! | |
| model = AutoModelForCausalLM.from_pretrained("./huggingface_qwen_generated", trust_remote_code=True) | |
| tokenizer = AutoTokenizer.from_pretrained("./huggingface_qwen_generated") | |
| # Generate code | |
| prompt = "Write a Python function to calculate the factorial of a number" | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.2) | |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| print(response) | |
| ``` | |
| ## Advanced Usage | |
| ### Adjusting Steering Strength | |
| ```python | |
| # Increase steering strength for stronger safety alignment | |
| model.set_caa_alpha(1.2) | |
| # Decrease for more creative outputs | |
| model.set_caa_alpha(0.5) | |
| ``` | |
| ### Disabling CAA Steering | |
| ```python | |
| # Disable CAA to get baseline model behavior | |
| model.set_caa_enabled(False) | |
| # Re-enable CAA | |
| model.set_caa_enabled(True) | |
| ``` | |
| ### Accessing Steering Configuration | |
| ```python | |
| print(f"CAA Layer: {model.caa_layer_id}") | |
| print(f"CAA Alpha: {model.caa_alpha}") | |
| print(f"Steering Method: {model.steering_method}") | |
| ``` | |
| ### Trait-Based Vector Organization | |
| The model uses a trait-based organization for steering vectors: | |
| ``` | |
| vectors/ | |
| ├── coding/ # Current: Optimized for code generation | |
| ├── safety/ # Future: Safety-aligned behavior | |
| ├── creativity/ # Future: Enhanced creative outputs | |
| ├── helpfulness/ # Future: Improved helpfulness | |
| └── reasoning/ # Future: Enhanced logical reasoning | |
| ``` | |
| To switch traits, simply update the configuration: | |
| ```json | |
| { | |
| "steering_vector_path": "./vectors/safety/steering_vector.safetensors" | |
| } | |
| ``` | |
| ## Technical Details | |
| ### CAA Steering Parameters | |
| - **Steering Method**: Contrastive Activation Addition (CAA) | |
| - **Optimal Layer**: 24 (out of 28 transformer layers) | |
| - **Steering Strength (α)**: 0.9 | |
| - **Vector Format**: Safetensors format for efficient loading and HuggingFace compatibility | |
| - **Vector Dimension**: 3584 (pre-normalized during training) | |
| - **Storage Path**: `./vectors/coding/steering_vector.safetensors` | |
| ### How It Works | |
| 1. **Trait-based Organization**: Steering vectors are organized by behavioral traits (`vectors/{trait}/`) | |
| 2. **Dynamic Loading**: The model loads the specified steering vector from the configured path | |
| 3. **Layer Application**: Steering is applied to hidden states at layer 24 during forward pass | |
| 4. **Generation Integration**: Steering affects the last token position during generation | |
| 5. **Configurable Strength**: The α parameter (default: 0.9) controls steering intensity | |
| 6. **Pre-optimized Vectors**: Steering vectors are pre-normalized and ready for immediate use | |
| ### Optimization Process | |
| The CAA parameters were optimized using: | |
| - **Framework**: Optuna with TPE sampler | |
| - **Search Space**: Layers 15-28, α ∈ [0.1, 5.0] | |
| - **Objective**: Maximize accuracy on MBPP Plus validation set | |
| - **Best Validation Score**: 64% accuracy | |
| ## Model Architecture | |
| ``` | |
| WisentQwen2ForCausalLM | |
| ├── Base: Qwen2.5-Coder-7B-Instruct | |
| ├── CAA Integration: Layer 24 | |
| ├── Steering Vector: ./vectors/coding/steering_vector.safetensors | |
| └── Auto-applied during generation | |
| ``` | |
| ## File Structure | |
| ``` | |
| huggingface_qwen_generated/ | |
| ├── config.json # Model configuration with CAA params | |
| ├── modeling_wisent_qwen.py # Custom model class | |
| ├── tokenizer files # Standard Qwen tokenizer | |
| ├── wisent_config.json # Optimization results | |
| └── vectors/ # Trait-based steering vectors | |
| └── coding/ | |
| └── steering_vector.safetensors # Optimized coding steering vector | |
| ``` | |
| ## Evaluation | |
| ### MBPP Plus Benchmark | |
| The model should be evaluated on the complete MBPP Plus dataset (378 problems) to measure improvement over the baseline. Expected improvements based on validation results. | |
| ### Running Evaluation | |
| ```python | |
| # Use with bigcode-evaluation-harness | |
| from transformers import AutoModelForCausalLM | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "./huggingface_qwen_generated", | |
| trust_remote_code=True | |
| ) | |
| # CAA steering is automatically applied during evaluation! | |
| # No manual hooks or modifications needed | |
| ``` | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @software{wisent_qwen_caa_2025, | |
| title={Wisent-Qwen2.5-Coder with CAA Steering}, | |
| author={Wisent AI}, | |
| year={2025}, | |
| url={https://github.com/wisent-ai/wisent-guard} | |
| } | |
| ``` | |
| ## License | |
| This model inherits the license from the base Qwen2.5-Coder-7B-Instruct model. Please refer to the original model's license for usage terms. | |
| ## Acknowledgments | |
| - Base model: Qwen2.5-Coder-7B-Instruct by Alibaba | |
| - CAA method: Contrastive Activation Addition | |
| - Optimization: Optuna framework | |
| - Implementation: Wisent Guard framework |