hmellor HF Staff commited on
Commit
9ccce62
·
verified ·
1 Parent(s): 0735a77

Make `config.json` compatible with standard sliding window config

Browse files

This will add `layer_types` to the loaded config class so that libraries such as vLLM can load hybrid attention models in the standard Hugging Face format.

Since we do not edit `configuration_phi4flash.py` this change is backwards compatible.

Once this change has been merged along with https://github.com/vllm-project/vllm/pull/21927 we can update `configuration_phi4flash.py` so that the the modelling code works in the standard way too.

Files changed (1) hide show
  1. config.json +6 -0
config.json CHANGED
@@ -26,6 +26,12 @@
26
  "num_key_value_heads": 20,
27
  "resid_pdrop": 0.0,
28
  "sliding_window": 512,
 
 
 
 
 
 
29
  "torch_dtype": "bfloat16",
30
  "tie_word_embeddings": true,
31
  "transformers_version": "4.46.1",
 
26
  "num_key_value_heads": 20,
27
  "resid_pdrop": 0.0,
28
  "sliding_window": 512,
29
+ "layer_types": [
30
+ "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention",
31
+ "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention",
32
+ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention",
33
+ "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention", "full_attention"
34
+ ],
35
  "torch_dtype": "bfloat16",
36
  "tie_word_embeddings": true,
37
  "transformers_version": "4.46.1",