n8cha commited on
Commit
25e74a4
·
1 Parent(s): d78d14a

quickstart steps:

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md CHANGED
@@ -5,6 +5,55 @@ language: en
5
  # Fork of sosier's [nanoGPT - Character-Level Shakespeare](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights)
6
  **This is a fork of [sosier/nanoGPT-shakespear-char-tied-weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights) for demonstration purposes.**
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  Below is the original README.
9
 
10
  ---
 
5
  # Fork of sosier's [nanoGPT - Character-Level Shakespeare](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights)
6
  **This is a fork of [sosier/nanoGPT-shakespear-char-tied-weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights) for demonstration purposes.**
7
 
8
+ ## Quickstart
9
+ Load model:
10
+ ```python
11
+ from transformers import AutoModel
12
+ model = AutoModel.from_pretrained("n8cha/nanoGPT-shakespeare-char", trust_remote_code=True)
13
+ ```
14
+
15
+ Setup inference:
16
+ ```python
17
+ import torch
18
+
19
+ class CharTokenizer:
20
+ def __init__(self):
21
+ self.token_map = {'\n': 0, ' ': 1, '!': 2, '$': 3, '&': 4, "'": 5, ',': 6, '-': 7, '.': 8, '3': 9, ':': 10, ';': 11, '?': 12, 'A': 13, 'B': 14, 'C': 15, 'D': 16, 'E': 17, 'F': 18, 'G': 19, 'H': 20, 'I': 21, 'J': 22, 'K': 23, 'L': 24, 'M': 25, 'N': 26, 'O': 27, 'P': 28, 'Q': 29, 'R': 30, 'S': 31, 'T': 32, 'U': 33, 'V': 34, 'W': 35, 'X': 36, 'Y': 37, 'Z': 38, 'a': 39, 'b': 40, 'c': 41, 'd': 42, 'e': 43, 'f': 44, 'g': 45, 'h': 46, 'i': 47, 'j': 48, 'k': 49, 'l': 50, 'm': 51, 'n': 52, 'o': 53, 'p': 54, 'q': 55, 'r': 56, 's': 57, 't': 58, 'u': 59, 'v': 60, 'w': 61, 'x': 62, 'y': 63, 'z': 64}
22
+ self.rev_map = {v: k for k, v in self.token_map.items()}
23
+
24
+ def encode(self, text):
25
+ try:
26
+ return [self.token_map[c] for c in text]
27
+ except KeyError as e:
28
+ raise ValueError(f"Character not in vocabulary: {e.args[0]}")
29
+
30
+ def decode(self, tokens):
31
+ try:
32
+ return ''.join(self.rev_map[t] for t in tokens)
33
+ except KeyError as e:
34
+ raise ValueError(f"Token not in vocabulary: {e.args[0]}")
35
+
36
+ tokenizer = CharTokenizer()
37
+
38
+ def generate(prompt):
39
+ prompt_encoded = tokenizer.encode(prompt)
40
+ x = (torch.tensor(prompt_encoded, dtype=torch.long, device="cpu")[None, ...])
41
+ with torch.no_grad():
42
+ y = model.generate(
43
+ x,
44
+ max_new_tokens=1000,
45
+ temperature=0.8,
46
+ top_k=200
47
+ )
48
+ return tokenizer.decode(y[0].tolist())
49
+ ```
50
+
51
+ Run inference:
52
+ ```
53
+ response = generate("O Romeo, Romeo, ")
54
+ print(response)
55
+ ```
56
+
57
  Below is the original README.
58
 
59
  ---