Update README.md
Browse files
README.md
CHANGED
|
@@ -96,6 +96,12 @@ The model was pre-trained continuously on a single A10G GPU in an AWS instance f
|
|
| 96 |
<br>Thus, hurts the performance of the Abstractive Summarization task.
|
| 97 |
<br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.
|
| 98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
#### Authors:
|
| 100 |
|
| 101 |
<a href="https://www.linkedin.com/in/bijaya-bhatta-69536018a/">Vijaya Bhatta</a>
|
|
|
|
| 96 |
<br>Thus, hurts the performance of the Abstractive Summarization task.
|
| 97 |
<br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.
|
| 98 |
|
| 99 |
+
Note:
|
| 100 |
+
|
| 101 |
+
Could be used as an encoder-only model but advised not to use it.
|
| 102 |
+
<br>Because there are already bert-based models with better inference time. (due to longer sequence length)
|
| 103 |
+
<br>This could be used in case a longer sequence length is required.
|
| 104 |
+
|
| 105 |
#### Authors:
|
| 106 |
|
| 107 |
<a href="https://www.linkedin.com/in/bijaya-bhatta-69536018a/">Vijaya Bhatta</a>
|