Update modeling_codeshell.py
Browse files- modeling_codeshell.py +1 -15
modeling_codeshell.py
CHANGED
|
@@ -457,15 +457,12 @@ class CodeShellPreTrainedModel(PreTrainedModel):
|
|
| 457 |
|
| 458 |
|
| 459 |
GPT_BIGCODE_START_DOCSTRING = r"""
|
| 460 |
-
|
| 461 |
This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
|
| 462 |
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
|
| 463 |
etc.)
|
| 464 |
-
|
| 465 |
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
|
| 466 |
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
|
| 467 |
and behavior.
|
| 468 |
-
|
| 469 |
Parameters:
|
| 470 |
config ([`CodeShellConfig`]): Model configuration class with all the parameters of the model.
|
| 471 |
Initializing with a config file does not load the weights associated with the model, only the
|
|
@@ -478,13 +475,10 @@ GPT_BIGCODE_INPUTS_DOCSTRING = r"""
|
|
| 478 |
`input_ids_length` = `sequence_length` if `past_key_values` is `None` else
|
| 479 |
`past_key_values[0][0].shape[-2]` (`sequence_length` of input past key value states). Indices of input
|
| 480 |
sequence tokens in the vocabulary.
|
| 481 |
-
|
| 482 |
If `past_key_values` is used, only `input_ids` that do not have their past calculated should be passed as
|
| 483 |
`input_ids`.
|
| 484 |
-
|
| 485 |
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
|
| 486 |
[`PreTrainedTokenizer.__call__`] for details.
|
| 487 |
-
|
| 488 |
[What are input IDs?](../glossary#input-ids)
|
| 489 |
past_key_values (`Tuple[torch.Tensor]` of length `config.n_layers`):
|
| 490 |
Contains precomputed hidden-states (key and values in the attention blocks) as computed by the model (see
|
|
@@ -492,39 +486,30 @@ GPT_BIGCODE_INPUTS_DOCSTRING = r"""
|
|
| 492 |
their past given to this model should not be passed as `input_ids` as they have already been computed.
|
| 493 |
attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
|
| 494 |
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
|
| 495 |
-
|
| 496 |
- 1 for tokens that are **not masked**,
|
| 497 |
- 0 for tokens that are **masked**.
|
| 498 |
-
|
| 499 |
If `past_key_values` is used, `attention_mask` needs to contain the masking strategy that was used for
|
| 500 |
`past_key_values`. In other words, the `attention_mask` always has to have the length:
|
| 501 |
`len(past_key_values) + len(input_ids)`
|
| 502 |
-
|
| 503 |
[What are attention masks?](../glossary#attention-mask)
|
| 504 |
token_type_ids (`torch.Tensor` of shape `(batch_size, input_ids_length)`, *optional*):
|
| 505 |
Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0,
|
| 506 |
1]`:
|
| 507 |
-
|
| 508 |
- 0 corresponds to a *sentence A* token,
|
| 509 |
- 1 corresponds to a *sentence B* token.
|
| 510 |
-
|
| 511 |
[What are token type IDs?](../glossary#token-type-ids)
|
| 512 |
position_ids (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
|
| 513 |
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
|
| 514 |
config.max_position_embeddings - 1]`.
|
| 515 |
-
|
| 516 |
[What are position IDs?](../glossary#position-ids)
|
| 517 |
head_mask (`torch.Tensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
|
| 518 |
Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:
|
| 519 |
-
|
| 520 |
- 1 indicates the head is **not masked**,
|
| 521 |
- 0 indicates the head is **masked**.
|
| 522 |
-
|
| 523 |
inputs_embeds (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
|
| 524 |
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
|
| 525 |
is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
|
| 526 |
model's internal embedding lookup matrix.
|
| 527 |
-
|
| 528 |
If `past_key_values` is used, optionally only the last `inputs_embeds` have to be input (see
|
| 529 |
`past_key_values`).
|
| 530 |
use_cache (`bool`, *optional*):
|
|
@@ -959,6 +944,7 @@ class CodeShellForCausalLM(CodeShellPreTrainedModel):
|
|
| 959 |
prompt += ai_name.rstrip()
|
| 960 |
|
| 961 |
max_new_tokens = max_new_tokens or self.generation_config.max_new_tokens
|
|
|
|
| 962 |
max_input_tokens = self.config.n_positions - max_new_tokens
|
| 963 |
|
| 964 |
input_tokens = tokenizer.encode(prompt)
|
|
|
|
| 457 |
|
| 458 |
|
| 459 |
GPT_BIGCODE_START_DOCSTRING = r"""
|
|
|
|
| 460 |
This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
|
| 461 |
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
|
| 462 |
etc.)
|
|
|
|
| 463 |
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
|
| 464 |
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
|
| 465 |
and behavior.
|
|
|
|
| 466 |
Parameters:
|
| 467 |
config ([`CodeShellConfig`]): Model configuration class with all the parameters of the model.
|
| 468 |
Initializing with a config file does not load the weights associated with the model, only the
|
|
|
|
| 475 |
`input_ids_length` = `sequence_length` if `past_key_values` is `None` else
|
| 476 |
`past_key_values[0][0].shape[-2]` (`sequence_length` of input past key value states). Indices of input
|
| 477 |
sequence tokens in the vocabulary.
|
|
|
|
| 478 |
If `past_key_values` is used, only `input_ids` that do not have their past calculated should be passed as
|
| 479 |
`input_ids`.
|
|
|
|
| 480 |
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
|
| 481 |
[`PreTrainedTokenizer.__call__`] for details.
|
|
|
|
| 482 |
[What are input IDs?](../glossary#input-ids)
|
| 483 |
past_key_values (`Tuple[torch.Tensor]` of length `config.n_layers`):
|
| 484 |
Contains precomputed hidden-states (key and values in the attention blocks) as computed by the model (see
|
|
|
|
| 486 |
their past given to this model should not be passed as `input_ids` as they have already been computed.
|
| 487 |
attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
|
| 488 |
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
|
|
|
|
| 489 |
- 1 for tokens that are **not masked**,
|
| 490 |
- 0 for tokens that are **masked**.
|
|
|
|
| 491 |
If `past_key_values` is used, `attention_mask` needs to contain the masking strategy that was used for
|
| 492 |
`past_key_values`. In other words, the `attention_mask` always has to have the length:
|
| 493 |
`len(past_key_values) + len(input_ids)`
|
|
|
|
| 494 |
[What are attention masks?](../glossary#attention-mask)
|
| 495 |
token_type_ids (`torch.Tensor` of shape `(batch_size, input_ids_length)`, *optional*):
|
| 496 |
Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0,
|
| 497 |
1]`:
|
|
|
|
| 498 |
- 0 corresponds to a *sentence A* token,
|
| 499 |
- 1 corresponds to a *sentence B* token.
|
|
|
|
| 500 |
[What are token type IDs?](../glossary#token-type-ids)
|
| 501 |
position_ids (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
|
| 502 |
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
|
| 503 |
config.max_position_embeddings - 1]`.
|
|
|
|
| 504 |
[What are position IDs?](../glossary#position-ids)
|
| 505 |
head_mask (`torch.Tensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
|
| 506 |
Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:
|
|
|
|
| 507 |
- 1 indicates the head is **not masked**,
|
| 508 |
- 0 indicates the head is **masked**.
|
|
|
|
| 509 |
inputs_embeds (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
|
| 510 |
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
|
| 511 |
is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
|
| 512 |
model's internal embedding lookup matrix.
|
|
|
|
| 513 |
If `past_key_values` is used, optionally only the last `inputs_embeds` have to be input (see
|
| 514 |
`past_key_values`).
|
| 515 |
use_cache (`bool`, *optional*):
|
|
|
|
| 944 |
prompt += ai_name.rstrip()
|
| 945 |
|
| 946 |
max_new_tokens = max_new_tokens or self.generation_config.max_new_tokens
|
| 947 |
+
max_new_tokens = max_new_tokens or 128
|
| 948 |
max_input_tokens = self.config.n_positions - max_new_tokens
|
| 949 |
|
| 950 |
input_tokens = tokenizer.encode(prompt)
|