quickstart steps:
Browse files
README.md
CHANGED
@@ -5,6 +5,55 @@ language: en
|
|
5 |
# Fork of sosier's [nanoGPT - Character-Level Shakespeare](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights)
|
6 |
**This is a fork of [sosier/nanoGPT-shakespear-char-tied-weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights) for demonstration purposes.**
|
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
Below is the original README.
|
9 |
|
10 |
---
|
|
|
5 |
# Fork of sosier's [nanoGPT - Character-Level Shakespeare](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights)
|
6 |
**This is a fork of [sosier/nanoGPT-shakespear-char-tied-weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights) for demonstration purposes.**
|
7 |
|
8 |
+
## Quickstart
|
9 |
+
Load model:
|
10 |
+
```python
|
11 |
+
from transformers import AutoModel
|
12 |
+
model = AutoModel.from_pretrained("n8cha/nanoGPT-shakespeare-char", trust_remote_code=True)
|
13 |
+
```
|
14 |
+
|
15 |
+
Setup inference:
|
16 |
+
```python
|
17 |
+
import torch
|
18 |
+
|
19 |
+
class CharTokenizer:
|
20 |
+
def __init__(self):
|
21 |
+
self.token_map = {'\n': 0, ' ': 1, '!': 2, '$': 3, '&': 4, "'": 5, ',': 6, '-': 7, '.': 8, '3': 9, ':': 10, ';': 11, '?': 12, 'A': 13, 'B': 14, 'C': 15, 'D': 16, 'E': 17, 'F': 18, 'G': 19, 'H': 20, 'I': 21, 'J': 22, 'K': 23, 'L': 24, 'M': 25, 'N': 26, 'O': 27, 'P': 28, 'Q': 29, 'R': 30, 'S': 31, 'T': 32, 'U': 33, 'V': 34, 'W': 35, 'X': 36, 'Y': 37, 'Z': 38, 'a': 39, 'b': 40, 'c': 41, 'd': 42, 'e': 43, 'f': 44, 'g': 45, 'h': 46, 'i': 47, 'j': 48, 'k': 49, 'l': 50, 'm': 51, 'n': 52, 'o': 53, 'p': 54, 'q': 55, 'r': 56, 's': 57, 't': 58, 'u': 59, 'v': 60, 'w': 61, 'x': 62, 'y': 63, 'z': 64}
|
22 |
+
self.rev_map = {v: k for k, v in self.token_map.items()}
|
23 |
+
|
24 |
+
def encode(self, text):
|
25 |
+
try:
|
26 |
+
return [self.token_map[c] for c in text]
|
27 |
+
except KeyError as e:
|
28 |
+
raise ValueError(f"Character not in vocabulary: {e.args[0]}")
|
29 |
+
|
30 |
+
def decode(self, tokens):
|
31 |
+
try:
|
32 |
+
return ''.join(self.rev_map[t] for t in tokens)
|
33 |
+
except KeyError as e:
|
34 |
+
raise ValueError(f"Token not in vocabulary: {e.args[0]}")
|
35 |
+
|
36 |
+
tokenizer = CharTokenizer()
|
37 |
+
|
38 |
+
def generate(prompt):
|
39 |
+
prompt_encoded = tokenizer.encode(prompt)
|
40 |
+
x = (torch.tensor(prompt_encoded, dtype=torch.long, device="cpu")[None, ...])
|
41 |
+
with torch.no_grad():
|
42 |
+
y = model.generate(
|
43 |
+
x,
|
44 |
+
max_new_tokens=1000,
|
45 |
+
temperature=0.8,
|
46 |
+
top_k=200
|
47 |
+
)
|
48 |
+
return tokenizer.decode(y[0].tolist())
|
49 |
+
```
|
50 |
+
|
51 |
+
Run inference:
|
52 |
+
```
|
53 |
+
response = generate("O Romeo, Romeo, ")
|
54 |
+
print(response)
|
55 |
+
```
|
56 |
+
|
57 |
Below is the original README.
|
58 |
|
59 |
---
|