You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Đà mã 2 (Llama2 architecture)

Dama2 is an autoregressive Large Language Model (LLM), based on Llama2's model architecture. Dama2 was trained on part of the Common Crawl dataset in Vietnamese and English.

Details will be available soon.

To contact us, mail to: leanhcuong@gmail.com (Lê Anh Cường) | hieunguyen1053@outlook.com (Hiếu)

How to use

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("vietgpt/dama-2-7b")
model = AutoModelForCausalLM.from_pretrained("vietgpt/dama-2-7b", low_cpu_mem_usage=True)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 
model.to(device)

prompt = "Địa chỉ trường Đại học Tôn Đức Thắng nằm ở số"
input_ids = tokenizer(prompt, return_tensors="pt")['input_ids'].to(device)

gen_tokens = model.generate(input_ids, max_length=max_length, repetition_penalty=1.1)

print(tokenizer.batch_decode(gen_tokens)[0])

{
  "results": {
    "lambada_vi": {
      "ppl": 17.662483545322115,
      "ppl_stderr": 0.46441057543941494,
      "acc": 0.34159672067148156,
      "acc_stderr": 0.004685401990271572
    }
  },
  "versions": {
    "lambada_vi": null
  },
  "config": {
    "model": "hf-causal",
    "model_args": "pretrained=vietgpt/dama-2-7b",
    "num_fewshot": 0,
    "batch_size": null,
    "batch_sizes": [],
    "device": "cuda:1",
    "no_cache": false,
    "limit": null,
    "bootstrap_iters": 100000,
    "description_dict": {}
  }
}
hf-causal (pretrained=vietgpt/dama-2-7b), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|   Task   |Version|Metric| Value |   |Stderr|
|----------|-------|------|------:|---|-----:|
|lambada_vi|       |ppl   |17.6625|±  |0.4644|
|          |       |acc   | 0.3416|±  |0.0047|

Downloads last month: -

Dataset used to train vietgpt/dama-2-7b

Spaces using vietgpt/dama-2-7b 2

Evaluation results

Perplexity on ViLambada
test set self-reported

6.951
SacreBLEU on English to Vietnamese Formal/Informal translation
test set self-reported

26.300