uitnlp
/

CafeBERT

Vietnamese Question Answering

Vietnamese Reading Comprehension

Vietnamese Language Understanding

Vietnamese Natural Language Inference

Model card Files Files and versions

ThuanPhong commited on Mar 23, 2024

Commit

b4b6358

·

verified ·

1 Parent(s): 7ac99d6

Update README.md

Files changed (1) hide show

README.md +32 -0

README.md CHANGED Viewed

@@ -7,3 +7,35 @@ widget:
 - text: "Cà phê được trồng nhiều ở khu vực Tây <mask> của Việt Nam."
   example_title: "Example 2"
 ---

 - text: "Cà phê được trồng nhiều ở khu vực Tây <mask> của Việt Nam."
   example_title: "Example 2"
 ---
+# <a name="introduction"></a> CafeBERT: A Pre-Trained Language Model for Vietnamese (NAACL-2024 Findings)
+The pre-trained CafeBERT model is the state-of-the-art language model for Vietnamese *(Cafe or coffee is a popular drink every morning in Vietnam)*:
+CafeBERT is a large-scale multilingual language model with strong support for Vietnamese. The model is based on XLM-Roberta (the state-of-the-art multilingual language model) and is enhanced with a large Vietnamese corpus with many domains: Wikipedia, newspapers... CafeBERT has outstanding performance on the VLUE benchmark and other tasks, like: machine reading comprehension, text classification, natural language inference, part-of-speech tagging...
+The general architecture and experimental results of PhoBERT can be found in our paper:
+Please **CITE** our paper when CafeBERT is used to help produce published results or is incorporated into other software.
+**Installation**
+Install `transformers` and `SentencePiece` packages:
+    pip install transformers
+    pip install SentencePiece
+**Example usage**
+```python
+from transformers import AutoModel, AutoTokenizer
+import torch
+model= AutoModel.from_pretrained('uitnlp/CafeBERT')
+tokenizer = AutoTokenizer.from_pretrained('uitnlp/CafeBERT')
+encoding = tokenizer('Cà phê được trồng nhiều ở khu vực Tây Nguyên của Việt Nam.', return_tensors='pt')
+with torch.no_grad():
+  output = model(**encoding)
+```