Writer
/

palmyra-large

@@ -3,6 +3,7 @@ language:
 - en
 datasets:
 - English
 tags:
 - text generation
 - pytorch
@@ -12,13 +13,15 @@ tags:
 - NeMo
 pipeline_tag: text-generation
 library_name: transformers
 ---
-license: cc-by-4.0
 # Palmyra Large 20B
 <style>
 img {
  display: inline;
@@ -28,10 +31,37 @@ img {
 |[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-20B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
-## Model Description
 Palmyra Large was primarily pre-trained with English text. Note that there is still a trace amount of non-English data present within the training corpus that was accessed through CommonCrawl. A causal language modeling (CLM) objective was utilized during the process of the model's pretraining. Similar to GPT-3, Palmyra Large is a member of the same family of models that only contain a decoder. As a result, it was pre-trained utilizing the objective of self-supervised causal language modeling. Palmyra Large uses the prompts and general experimental setup from GPT-3 in order to conduct its evaluation per GPT-3.
 ### Use case
 Palmyra Large is extremely powerful while being extremely fast. This model excels at many nuanced tasks such as sentiment classification and summarization.
@@ -88,4 +118,6 @@ To cite this model:
   year = 2023,
   month = March
 }
-```

 - en
 datasets:
 - English
+- Writer/palmyra-data-index
 tags:
 - text generation
 - pytorch
 - NeMo
 pipeline_tag: text-generation
 library_name: transformers
+license: apache-2.0
 ---
 # Palmyra Large 20B
+**Palmyra-Large is a 20B parameters causal decoder-only model built by [Writer](https://www.Writer.com) and trained on +800B tokens of [Palmyra-Index-Data](https://huggingface.co/datasets/Writer/palmyra-data-index) enhanced with curated corpora.**
 <style>
 img {
  display: inline;
 |[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-20B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
+## Model Details
 Palmyra Large was primarily pre-trained with English text. Note that there is still a trace amount of non-English data present within the training corpus that was accessed through CommonCrawl. A causal language modeling (CLM) objective was utilized during the process of the model's pretraining. Similar to GPT-3, Palmyra Large is a member of the same family of models that only contain a decoder. As a result, it was pre-trained utilizing the objective of self-supervised causal language modeling. Palmyra Large uses the prompts and general experimental setup from GPT-3 in order to conduct its evaluation per GPT-3.
+### Model Description
+- **Developed by:** [https://www.writer.com](https://www.writer.com);
+- **Model type:** Causal decoder-only;
+- **Language(s) (NLP):** English (and limited capabilities in German, Spanish, French, Swedish);
+- **License:** Apache 2.0 license.
+## Uses
+### Direct Use
+Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.)
+### Out-of-Scope Use
+Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
+## Bias, Risks, and Limitations
+Palmyra-large-20B is trained mostly on English with limited capabilities also in German, Spanish, French, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
+### Recommendations
+We recommend users of Palmyra-Large-20B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.
 ### Use case
 Palmyra Large is extremely powerful while being extremely fast. This model excels at many nuanced tasks such as sentiment classification and summarization.
   year = 2023,
   month = March
 }
+```
+## Contact
+[email protected]