MuzzammilShah commited on
Commit
73facba
·
verified ·
1 Parent(s): 93142d5

Initial commits for files

Browse files
Files changed (5) hide show
  1. A-main-notebook.ipynb +589 -0
  2. B-main-notebook.ipynb +0 -0
  3. C-main-notebook.ipynb +649 -0
  4. README.md +61 -0
  5. names.txt +0 -0
A-main-notebook.ipynb ADDED
@@ -0,0 +1,589 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "import torch\n",
10
+ "import torch.nn.functional as F\n",
11
+ "import matplotlib.pyplot as plt # for making figures\n",
12
+ "%matplotlib inline"
13
+ ]
14
+ },
15
+ {
16
+ "cell_type": "code",
17
+ "execution_count": 2,
18
+ "metadata": {},
19
+ "outputs": [
20
+ {
21
+ "data": {
22
+ "text/plain": [
23
+ "['emma', 'olivia', 'ava', 'isabella', 'sophia', 'charlotte', 'mia', 'amelia']"
24
+ ]
25
+ },
26
+ "execution_count": 2,
27
+ "metadata": {},
28
+ "output_type": "execute_result"
29
+ }
30
+ ],
31
+ "source": [
32
+ "# read in all the words\n",
33
+ "words = open('names.txt', 'r').read().splitlines()\n",
34
+ "words[:8]"
35
+ ]
36
+ },
37
+ {
38
+ "cell_type": "code",
39
+ "execution_count": 3,
40
+ "metadata": {},
41
+ "outputs": [
42
+ {
43
+ "data": {
44
+ "text/plain": [
45
+ "32033"
46
+ ]
47
+ },
48
+ "execution_count": 3,
49
+ "metadata": {},
50
+ "output_type": "execute_result"
51
+ }
52
+ ],
53
+ "source": [
54
+ "len(words)"
55
+ ]
56
+ },
57
+ {
58
+ "cell_type": "code",
59
+ "execution_count": 3,
60
+ "metadata": {},
61
+ "outputs": [
62
+ {
63
+ "name": "stdout",
64
+ "output_type": "stream",
65
+ "text": [
66
+ "{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z', 0: '.'}\n"
67
+ ]
68
+ }
69
+ ],
70
+ "source": [
71
+ "# build the vocabulary of characters and mappings to/from integers\n",
72
+ "chars = sorted(list(set(''.join(words))))\n",
73
+ "stoi = {s:i+1 for i,s in enumerate(chars)}\n",
74
+ "stoi['.'] = 0\n",
75
+ "itos = {i:s for s,i in stoi.items()}\n",
76
+ "print(itos)"
77
+ ]
78
+ },
79
+ {
80
+ "cell_type": "code",
81
+ "execution_count": 13,
82
+ "metadata": {},
83
+ "outputs": [
84
+ {
85
+ "name": "stdout",
86
+ "output_type": "stream",
87
+ "text": [
88
+ "... ---> e\n",
89
+ "..e ---> m\n",
90
+ ".em ---> m\n",
91
+ "emm ---> a\n",
92
+ "mma ---> .\n",
93
+ "... ---> o\n",
94
+ "..o ---> l\n",
95
+ ".ol ---> i\n",
96
+ "oli ---> v\n",
97
+ "liv ---> i\n",
98
+ "ivi ---> a\n",
99
+ "via ---> .\n",
100
+ "... ---> a\n",
101
+ "..a ---> v\n",
102
+ ".av ---> a\n",
103
+ "ava ---> .\n",
104
+ "... ---> i\n",
105
+ "..i ---> s\n",
106
+ ".is ---> a\n",
107
+ "isa ---> b\n",
108
+ "sab ---> e\n",
109
+ "abe ---> l\n",
110
+ "bel ---> l\n",
111
+ "ell ---> a\n",
112
+ "lla ---> .\n",
113
+ "... ---> s\n",
114
+ "..s ---> o\n",
115
+ ".so ---> p\n",
116
+ "sop ---> h\n",
117
+ "oph ---> i\n",
118
+ "phi ---> a\n",
119
+ "hia ---> .\n"
120
+ ]
121
+ }
122
+ ],
123
+ "source": [
124
+ "# build the dataset\n",
125
+ "\n",
126
+ "block_size = 3 # context length: how many characters do we take to predict the next one?\n",
127
+ "X, Y = [], []\n",
128
+ "for w in words[:5]:\n",
129
+ " \n",
130
+ " #print(w)\n",
131
+ " context = [0] * block_size\n",
132
+ " for ch in w + '.':\n",
133
+ " ix = stoi[ch]\n",
134
+ " X.append(context)\n",
135
+ " Y.append(ix)\n",
136
+ " print(''.join(itos[i] for i in context), '--->', itos[ix])\n",
137
+ " context = context[1:] + [ix] # crop and append\n",
138
+ " \n",
139
+ "X = torch.tensor(X)\n",
140
+ "Y = torch.tensor(Y)"
141
+ ]
142
+ },
143
+ {
144
+ "cell_type": "code",
145
+ "execution_count": 6,
146
+ "metadata": {},
147
+ "outputs": [
148
+ {
149
+ "data": {
150
+ "text/plain": [
151
+ "(torch.Size([32, 3]), torch.int64, torch.Size([32]), torch.int64)"
152
+ ]
153
+ },
154
+ "execution_count": 6,
155
+ "metadata": {},
156
+ "output_type": "execute_result"
157
+ }
158
+ ],
159
+ "source": [
160
+ "X.shape, X.dtype, Y.shape, Y.dtype"
161
+ ]
162
+ },
163
+ {
164
+ "cell_type": "markdown",
165
+ "metadata": {},
166
+ "source": [
167
+ "So our dataset looks like this^ \\\n",
168
+ "\\\n",
169
+ "So, for each of those above 5 words, \\\n",
170
+ "`torch.Size([32, 3])` we have created a dataset of 32 examples and each input of the neural net is 3 integers => X \\\n",
171
+ "`torch.Size([32])` and these are the labels (single row, 32 values) => Y"
172
+ ]
173
+ },
174
+ {
175
+ "cell_type": "code",
176
+ "execution_count": 13,
177
+ "metadata": {},
178
+ "outputs": [
179
+ {
180
+ "data": {
181
+ "text/plain": [
182
+ "tensor([[ 0, 0, 0],\n",
183
+ " [ 0, 0, 5],\n",
184
+ " [ 0, 5, 13],\n",
185
+ " [ 5, 13, 13],\n",
186
+ " [13, 13, 1],\n",
187
+ " [ 0, 0, 0],\n",
188
+ " [ 0, 0, 15],\n",
189
+ " [ 0, 15, 12],\n",
190
+ " [15, 12, 9],\n",
191
+ " [12, 9, 22],\n",
192
+ " [ 9, 22, 9],\n",
193
+ " [22, 9, 1],\n",
194
+ " [ 0, 0, 0],\n",
195
+ " [ 0, 0, 1],\n",
196
+ " [ 0, 1, 22],\n",
197
+ " [ 1, 22, 1],\n",
198
+ " [ 0, 0, 0],\n",
199
+ " [ 0, 0, 9],\n",
200
+ " [ 0, 9, 19],\n",
201
+ " [ 9, 19, 1],\n",
202
+ " [19, 1, 2],\n",
203
+ " [ 1, 2, 5],\n",
204
+ " [ 2, 5, 12],\n",
205
+ " [ 5, 12, 12],\n",
206
+ " [12, 12, 1],\n",
207
+ " [ 0, 0, 0],\n",
208
+ " [ 0, 0, 19],\n",
209
+ " [ 0, 19, 15],\n",
210
+ " [19, 15, 16],\n",
211
+ " [15, 16, 8],\n",
212
+ " [16, 8, 9],\n",
213
+ " [ 8, 9, 1]])"
214
+ ]
215
+ },
216
+ "execution_count": 13,
217
+ "metadata": {},
218
+ "output_type": "execute_result"
219
+ }
220
+ ],
221
+ "source": [
222
+ "X"
223
+ ]
224
+ },
225
+ {
226
+ "cell_type": "code",
227
+ "execution_count": 14,
228
+ "metadata": {},
229
+ "outputs": [
230
+ {
231
+ "data": {
232
+ "text/plain": [
233
+ "tensor([ 5, 13, 13, 1, 0, 15, 12, 9, 22, 9, 1, 0, 1, 22, 1, 0, 9, 19,\n",
234
+ " 1, 2, 5, 12, 12, 1, 0, 19, 15, 16, 8, 9, 1, 0])"
235
+ ]
236
+ },
237
+ "execution_count": 14,
238
+ "metadata": {},
239
+ "output_type": "execute_result"
240
+ }
241
+ ],
242
+ "source": [
243
+ "Y"
244
+ ]
245
+ },
246
+ {
247
+ "cell_type": "code",
248
+ "execution_count": 8,
249
+ "metadata": {},
250
+ "outputs": [],
251
+ "source": [
252
+ "C = torch.rand((27, 2))"
253
+ ]
254
+ },
255
+ {
256
+ "cell_type": "code",
257
+ "execution_count": 9,
258
+ "metadata": {},
259
+ "outputs": [
260
+ {
261
+ "data": {
262
+ "text/plain": [
263
+ "torch.Size([32, 3, 2])"
264
+ ]
265
+ },
266
+ "execution_count": 9,
267
+ "metadata": {},
268
+ "output_type": "execute_result"
269
+ }
270
+ ],
271
+ "source": [
272
+ "emb = C[X]\n",
273
+ "\n",
274
+ "emb.shape"
275
+ ]
276
+ },
277
+ {
278
+ "cell_type": "markdown",
279
+ "metadata": {},
280
+ "source": [
281
+ "(PyTorch indexing is awesome) \\\n",
282
+ "\\\n",
283
+ "To index simultaneously all the elements of X, We simply do C[X]"
284
+ ]
285
+ },
286
+ {
287
+ "cell_type": "code",
288
+ "execution_count": 10,
289
+ "metadata": {},
290
+ "outputs": [],
291
+ "source": [
292
+ "W1 = torch.randn((6, 100))\n",
293
+ "b1 = torch.rand(100)"
294
+ ]
295
+ },
296
+ {
297
+ "cell_type": "code",
298
+ "execution_count": 11,
299
+ "metadata": {},
300
+ "outputs": [],
301
+ "source": [
302
+ "h = torch.tanh(emb.view(-1, 6) @ W1 + b1)"
303
+ ]
304
+ },
305
+ {
306
+ "cell_type": "code",
307
+ "execution_count": 12,
308
+ "metadata": {},
309
+ "outputs": [
310
+ {
311
+ "data": {
312
+ "text/plain": [
313
+ "tensor([[ 0.9910, 0.8405, 0.4715, ..., 0.9999, 0.8814, 0.9998],\n",
314
+ " [ 0.9763, 0.9163, 0.3350, ..., 0.9991, 0.8249, 0.9992],\n",
315
+ " [ 0.9791, 0.8450, -0.0272, ..., 0.9997, 0.9230, 0.9997],\n",
316
+ " ...,\n",
317
+ " [ 0.8995, 0.6590, 0.4667, ..., 0.9995, -0.4144, 0.9988],\n",
318
+ " [ 0.9777, 0.7397, 0.2623, ..., 0.9999, 0.9593, 0.9999],\n",
319
+ " [ 0.9402, 0.7154, 0.2493, ..., 0.9980, -0.6247, 0.9979]])"
320
+ ]
321
+ },
322
+ "execution_count": 12,
323
+ "metadata": {},
324
+ "output_type": "execute_result"
325
+ }
326
+ ],
327
+ "source": [
328
+ "h"
329
+ ]
330
+ },
331
+ {
332
+ "cell_type": "code",
333
+ "execution_count": 13,
334
+ "metadata": {},
335
+ "outputs": [
336
+ {
337
+ "data": {
338
+ "text/plain": [
339
+ "torch.Size([32, 100])"
340
+ ]
341
+ },
342
+ "execution_count": 13,
343
+ "metadata": {},
344
+ "output_type": "execute_result"
345
+ }
346
+ ],
347
+ "source": [
348
+ "h.shape"
349
+ ]
350
+ },
351
+ {
352
+ "cell_type": "markdown",
353
+ "metadata": {},
354
+ "source": [
355
+ "Hidden layer is now made^"
356
+ ]
357
+ },
358
+ {
359
+ "cell_type": "code",
360
+ "execution_count": 15,
361
+ "metadata": {},
362
+ "outputs": [],
363
+ "source": [
364
+ "W2 = torch.randn((100, 27))\n",
365
+ "b2 = torch.rand(27)"
366
+ ]
367
+ },
368
+ {
369
+ "cell_type": "code",
370
+ "execution_count": 16,
371
+ "metadata": {},
372
+ "outputs": [],
373
+ "source": [
374
+ "logits = h @ W2 + b2"
375
+ ]
376
+ },
377
+ {
378
+ "cell_type": "code",
379
+ "execution_count": 17,
380
+ "metadata": {},
381
+ "outputs": [
382
+ {
383
+ "data": {
384
+ "text/plain": [
385
+ "torch.Size([32, 27])"
386
+ ]
387
+ },
388
+ "execution_count": 17,
389
+ "metadata": {},
390
+ "output_type": "execute_result"
391
+ }
392
+ ],
393
+ "source": [
394
+ "logits.shape"
395
+ ]
396
+ },
397
+ {
398
+ "cell_type": "code",
399
+ "execution_count": 18,
400
+ "metadata": {},
401
+ "outputs": [],
402
+ "source": [
403
+ "counts = logits.exp()"
404
+ ]
405
+ },
406
+ {
407
+ "cell_type": "code",
408
+ "execution_count": 19,
409
+ "metadata": {},
410
+ "outputs": [],
411
+ "source": [
412
+ "prob = counts / counts.sum(1, keepdims=True)"
413
+ ]
414
+ },
415
+ {
416
+ "cell_type": "code",
417
+ "execution_count": 21,
418
+ "metadata": {},
419
+ "outputs": [
420
+ {
421
+ "data": {
422
+ "text/plain": [
423
+ "torch.Size([32, 27])"
424
+ ]
425
+ },
426
+ "execution_count": 21,
427
+ "metadata": {},
428
+ "output_type": "execute_result"
429
+ }
430
+ ],
431
+ "source": [
432
+ "prob.shape"
433
+ ]
434
+ },
435
+ {
436
+ "cell_type": "code",
437
+ "execution_count": 22,
438
+ "metadata": {},
439
+ "outputs": [
440
+ {
441
+ "data": {
442
+ "text/plain": [
443
+ "tensor(13.4043)"
444
+ ]
445
+ },
446
+ "execution_count": 22,
447
+ "metadata": {},
448
+ "output_type": "execute_result"
449
+ }
450
+ ],
451
+ "source": [
452
+ "loss = -prob[torch.arange(32), Y].log().mean()\n",
453
+ "loss"
454
+ ]
455
+ },
456
+ {
457
+ "cell_type": "markdown",
458
+ "metadata": {},
459
+ "source": [
460
+ "We've made the final output layer^ \\\n",
461
+ "Found the loss function value, which we have to reduce"
462
+ ]
463
+ },
464
+ {
465
+ "cell_type": "markdown",
466
+ "metadata": {},
467
+ "source": [
468
+ "---------------------"
469
+ ]
470
+ },
471
+ {
472
+ "cell_type": "markdown",
473
+ "metadata": {},
474
+ "source": [
475
+ "### **Summarising what we've done so far to make this more respectable :)**"
476
+ ]
477
+ },
478
+ {
479
+ "cell_type": "code",
480
+ "execution_count": 14,
481
+ "metadata": {},
482
+ "outputs": [
483
+ {
484
+ "data": {
485
+ "text/plain": [
486
+ "(torch.Size([32, 3]), torch.Size([32]))"
487
+ ]
488
+ },
489
+ "execution_count": 14,
490
+ "metadata": {},
491
+ "output_type": "execute_result"
492
+ }
493
+ ],
494
+ "source": [
495
+ "#Run the first 5 cells and then start from here\n",
496
+ "X.shape, Y.shape #dataset"
497
+ ]
498
+ },
499
+ {
500
+ "cell_type": "code",
501
+ "execution_count": 15,
502
+ "metadata": {},
503
+ "outputs": [],
504
+ "source": [
505
+ "g = torch.Generator().manual_seed(2147483647) #For consistency ofcourse, to keep the same values as andrej\n",
506
+ "C = torch.randn((27,2), generator=g)\n",
507
+ "W1 = torch.rand((6, 100), generator=g)\n",
508
+ "b1 = torch.rand(100, generator=g)\n",
509
+ "W2 = torch.rand((100, 27), generator=g)\n",
510
+ "b2 = torch.rand(27, generator=g)\n",
511
+ "parameters = [C, W1, b1, W2, b2]"
512
+ ]
513
+ },
514
+ {
515
+ "cell_type": "code",
516
+ "execution_count": 16,
517
+ "metadata": {},
518
+ "outputs": [
519
+ {
520
+ "data": {
521
+ "text/plain": [
522
+ "3481"
523
+ ]
524
+ },
525
+ "execution_count": 16,
526
+ "metadata": {},
527
+ "output_type": "execute_result"
528
+ }
529
+ ],
530
+ "source": [
531
+ "sum(p.nelement() for p in parameters) #to check number of parameters in total"
532
+ ]
533
+ },
534
+ {
535
+ "cell_type": "code",
536
+ "execution_count": 17,
537
+ "metadata": {},
538
+ "outputs": [
539
+ {
540
+ "data": {
541
+ "text/plain": [
542
+ "tensor(6.4365)"
543
+ ]
544
+ },
545
+ "execution_count": 17,
546
+ "metadata": {},
547
+ "output_type": "execute_result"
548
+ }
549
+ ],
550
+ "source": [
551
+ "emb = C[X]\n",
552
+ "h = torch.tanh(emb.view(-1,6) @ W1 + b1)\n",
553
+ "logits = h @ W2 + b2\n",
554
+ "counts = logits.exp()\n",
555
+ "prob = counts / counts.sum(1, keepdims=True)\n",
556
+ "loss = - prob[torch.arange(32), Y].log().mean()\n",
557
+ "loss"
558
+ ]
559
+ },
560
+ {
561
+ "cell_type": "markdown",
562
+ "metadata": {},
563
+ "source": [
564
+ "--------------"
565
+ ]
566
+ }
567
+ ],
568
+ "metadata": {
569
+ "kernelspec": {
570
+ "display_name": "venv",
571
+ "language": "python",
572
+ "name": "python3"
573
+ },
574
+ "language_info": {
575
+ "codemirror_mode": {
576
+ "name": "ipython",
577
+ "version": 3
578
+ },
579
+ "file_extension": ".py",
580
+ "mimetype": "text/x-python",
581
+ "name": "python",
582
+ "nbconvert_exporter": "python",
583
+ "pygments_lexer": "ipython3",
584
+ "version": "3.10.0"
585
+ }
586
+ },
587
+ "nbformat": 4,
588
+ "nbformat_minor": 2
589
+ }
B-main-notebook.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
C-main-notebook.ipynb ADDED
@@ -0,0 +1,649 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "import torch\n",
10
+ "import torch.nn.functional as F\n",
11
+ "import matplotlib.pyplot as plt # for making figures\n",
12
+ "%matplotlib inline"
13
+ ]
14
+ },
15
+ {
16
+ "cell_type": "code",
17
+ "execution_count": 2,
18
+ "metadata": {},
19
+ "outputs": [],
20
+ "source": [
21
+ "# read in all the words\n",
22
+ "words = open('names.txt', 'r').read().splitlines()\n",
23
+ "\n",
24
+ "\n",
25
+ "# build the vocabulary of characters and mappings to/from integers\n",
26
+ "chars = sorted(list(set(''.join(words))))\n",
27
+ "stoi = {s:i+1 for i,s in enumerate(chars)}\n",
28
+ "stoi['.'] = 0\n",
29
+ "itos = {i:s for s,i in stoi.items()}"
30
+ ]
31
+ },
32
+ {
33
+ "cell_type": "code",
34
+ "execution_count": 3,
35
+ "metadata": {},
36
+ "outputs": [
37
+ {
38
+ "name": "stdout",
39
+ "output_type": "stream",
40
+ "text": [
41
+ "torch.Size([182625, 3]) torch.Size([182625])\n",
42
+ "torch.Size([22655, 3]) torch.Size([22655])\n",
43
+ "torch.Size([22866, 3]) torch.Size([22866])\n"
44
+ ]
45
+ }
46
+ ],
47
+ "source": [
48
+ "# build the dataset\n",
49
+ "block_size = 3 # context length: how many characters do we take to predict the next one?\n",
50
+ "\n",
51
+ "def build_dataset(words): \n",
52
+ " X, Y = [], []\n",
53
+ " for w in words:\n",
54
+ "\n",
55
+ " #print(w)\n",
56
+ " context = [0] * block_size\n",
57
+ " for ch in w + '.':\n",
58
+ " ix = stoi[ch]\n",
59
+ " X.append(context)\n",
60
+ " Y.append(ix)\n",
61
+ " #print(''.join(itos[i] for i in context), '--->', itos[ix])\n",
62
+ " context = context[1:] + [ix] # crop and append\n",
63
+ "\n",
64
+ " X = torch.tensor(X)\n",
65
+ " Y = torch.tensor(Y)\n",
66
+ " print(X.shape, Y.shape)\n",
67
+ " return X, Y\n",
68
+ "\n",
69
+ "import random\n",
70
+ "random.seed(42)\n",
71
+ "random.shuffle(words)\n",
72
+ "n1 = int(0.8*len(words))\n",
73
+ "n2 = int(0.9*len(words))\n",
74
+ "\n",
75
+ "Xtr, Ytr = build_dataset(words[:n1])\n",
76
+ "Xdev, Ydev = build_dataset(words[n1:n2])\n",
77
+ "Xte, Yte = build_dataset(words[n2:])"
78
+ ]
79
+ },
80
+ {
81
+ "cell_type": "code",
82
+ "execution_count": 4,
83
+ "metadata": {},
84
+ "outputs": [
85
+ {
86
+ "data": {
87
+ "text/plain": [
88
+ "(torch.Size([182625, 3]), torch.Size([182625]))"
89
+ ]
90
+ },
91
+ "execution_count": 4,
92
+ "metadata": {},
93
+ "output_type": "execute_result"
94
+ }
95
+ ],
96
+ "source": [
97
+ "Xtr.shape, Ytr.shape #dataset"
98
+ ]
99
+ },
100
+ {
101
+ "cell_type": "code",
102
+ "execution_count": 20,
103
+ "metadata": {},
104
+ "outputs": [],
105
+ "source": [
106
+ "g = torch.Generator().manual_seed(2147483647) #For consistency ofcourse, to keep the same values as andrej\n",
107
+ "C = torch.randn((27,10), generator=g)\n",
108
+ "W1 = torch.rand((30, 300), generator=g)\n",
109
+ "b1 = torch.rand(300, generator=g)\n",
110
+ "W2 = torch.rand((300, 27), generator=g)\n",
111
+ "b2 = torch.rand(27, generator=g)\n",
112
+ "parameters = [C, W1, b1, W2, b2]"
113
+ ]
114
+ },
115
+ {
116
+ "cell_type": "code",
117
+ "execution_count": 21,
118
+ "metadata": {},
119
+ "outputs": [
120
+ {
121
+ "data": {
122
+ "text/plain": [
123
+ "17697"
124
+ ]
125
+ },
126
+ "execution_count": 21,
127
+ "metadata": {},
128
+ "output_type": "execute_result"
129
+ }
130
+ ],
131
+ "source": [
132
+ "sum(p.nelement() for p in parameters) # number of parameters in total"
133
+ ]
134
+ },
135
+ {
136
+ "cell_type": "code",
137
+ "execution_count": 22,
138
+ "metadata": {},
139
+ "outputs": [],
140
+ "source": [
141
+ "for p in parameters:\n",
142
+ " p.requires_grad = True"
143
+ ]
144
+ },
145
+ {
146
+ "cell_type": "code",
147
+ "execution_count": 8,
148
+ "metadata": {},
149
+ "outputs": [],
150
+ "source": [
151
+ "\n",
152
+ "lre = torch.linspace(-3, 0, 1000)\n",
153
+ "lrs = 10**lre"
154
+ ]
155
+ },
156
+ {
157
+ "cell_type": "code",
158
+ "execution_count": 30,
159
+ "metadata": {},
160
+ "outputs": [],
161
+ "source": [
162
+ "lri = []\n",
163
+ "lossi = []\n",
164
+ "stepi = []\n",
165
+ "\n",
166
+ "for i in range(40000):\n",
167
+ "\n",
168
+ " #Minibatch\n",
169
+ " xi = torch.randint(0, Xtr.shape[0], (32,))\n",
170
+ "\n",
171
+ " #forward pass\n",
172
+ " emb = C[Xtr[xi]]\n",
173
+ " h = torch.tanh(emb.view(-1,30) @ W1 + b1)\n",
174
+ " logits = h @ W2 + b2\n",
175
+ " loss = F.cross_entropy(logits, Ytr[xi])\n",
176
+ " #print(loss.item())\n",
177
+ "\n",
178
+ " #backward pass\n",
179
+ " for p in parameters:\n",
180
+ " p.grad = None\n",
181
+ " loss.backward()\n",
182
+ "\n",
183
+ " #update\n",
184
+ " #lr = lrs[i]\n",
185
+ " lr = 0.01\n",
186
+ " for p in parameters:\n",
187
+ " p.data += -lr * p.grad\n",
188
+ "\n",
189
+ " #keeping track\n",
190
+ " #lri.append(lr)\n",
191
+ " stepi.append(i)\n",
192
+ " lossi.append(loss.item())\n",
193
+ "\n",
194
+ "#print(loss.item())"
195
+ ]
196
+ },
197
+ {
198
+ "cell_type": "markdown",
199
+ "metadata": {},
200
+ "source": [
201
+ "The above cell will take a couple of seconds to run. Training a neural net can take a while, but luckily this is a very small neural network."
202
+ ]
203
+ },
204
+ {
205
+ "cell_type": "markdown",
206
+ "metadata": {},
207
+ "source": [
208
+ "**Evaluation:**"
209
+ ]
210
+ },
211
+ {
212
+ "cell_type": "code",
213
+ "execution_count": 31,
214
+ "metadata": {},
215
+ "outputs": [
216
+ {
217
+ "data": {
218
+ "text/plain": [
219
+ "tensor(2.1091, grad_fn=<NllLossBackward0>)"
220
+ ]
221
+ },
222
+ "execution_count": 31,
223
+ "metadata": {},
224
+ "output_type": "execute_result"
225
+ }
226
+ ],
227
+ "source": [
228
+ "emb = C[Xdev]\n",
229
+ "h = torch.tanh(emb.view(-1,30) @ W1 + b1)\n",
230
+ "logits = h @ W2 + b2\n",
231
+ "devloss = F.cross_entropy(logits, Ydev)\n",
232
+ "devloss"
233
+ ]
234
+ },
235
+ {
236
+ "cell_type": "code",
237
+ "execution_count": 32,
238
+ "metadata": {},
239
+ "outputs": [
240
+ {
241
+ "data": {
242
+ "text/plain": [
243
+ "tensor(2.0482, grad_fn=<NllLossBackward0>)"
244
+ ]
245
+ },
246
+ "execution_count": 32,
247
+ "metadata": {},
248
+ "output_type": "execute_result"
249
+ }
250
+ ],
251
+ "source": [
252
+ "emb = C[Xtr]\n",
253
+ "h = torch.tanh(emb.view(-1,30) @ W1 + b1)\n",
254
+ "logits = h @ W2 + b2\n",
255
+ "trloss = F.cross_entropy(logits, Ytr)\n",
256
+ "trloss"
257
+ ]
258
+ },
259
+ {
260
+ "cell_type": "markdown",
261
+ "metadata": {},
262
+ "source": [
263
+ "Training and Dev loss are almost the same. So we know we are not overfitting. But what it typically means is that the Neural Net is very small, so essentially it is underfitting the data. \\\n",
264
+ "\\\n",
265
+ "Therefore to improve the performance we'll need to increase the size of the neural net."
266
+ ]
267
+ },
268
+ {
269
+ "cell_type": "code",
270
+ "execution_count": 15,
271
+ "metadata": {},
272
+ "outputs": [
273
+ {
274
+ "data": {
275
+ "image/png": "",
276
+ "text/plain": [
277
+ "<Figure size 800x800 with 1 Axes>"
278
+ ]
279
+ },
280
+ "metadata": {},
281
+ "output_type": "display_data"
282
+ }
283
+ ],
284
+ "source": [
285
+ "plt.figure(figsize=(8,8))\n",
286
+ "plt.scatter(C[:,0].data, C[:,1].data, s=200)\n",
287
+ "for i in range(C.shape[0]):\n",
288
+ " plt.text(C[i,0].item(), C[i, 1].item(), itos[i], ha=\"center\", va=\"center\", color=\"white\")\n",
289
+ "plt.grid('minor')"
290
+ ]
291
+ },
292
+ {
293
+ "cell_type": "markdown",
294
+ "metadata": {},
295
+ "source": [
296
+ "------------"
297
+ ]
298
+ },
299
+ {
300
+ "cell_type": "markdown",
301
+ "metadata": {},
302
+ "source": [
303
+ "-------------"
304
+ ]
305
+ },
306
+ {
307
+ "cell_type": "markdown",
308
+ "metadata": {},
309
+ "source": [
310
+ "Not much changes to what we have done so far, but just some code improvement for the lr value to change based on the iterations. "
311
+ ]
312
+ },
313
+ {
314
+ "cell_type": "markdown",
315
+ "metadata": {},
316
+ "source": [
317
+ "Here basically we are open to experimenting with different values, whether it is the inputs, size of the layers or the loss rate values to see how we can decrease the final loss value."
318
+ ]
319
+ },
320
+ {
321
+ "cell_type": "code",
322
+ "execution_count": null,
323
+ "metadata": {},
324
+ "outputs": [],
325
+ "source": [
326
+ "# ------------ now made respectable :) ---------------"
327
+ ]
328
+ },
329
+ {
330
+ "cell_type": "code",
331
+ "execution_count": 33,
332
+ "metadata": {},
333
+ "outputs": [],
334
+ "source": [
335
+ "g = torch.Generator().manual_seed(2147483647) # for reproducibility\n",
336
+ "C = torch.randn((27, 10), generator=g)\n",
337
+ "W1 = torch.randn((30, 200), generator=g)\n",
338
+ "b1 = torch.randn(200, generator=g)\n",
339
+ "W2 = torch.randn((200, 27), generator=g)\n",
340
+ "b2 = torch.randn(27, generator=g)\n",
341
+ "parameters = [C, W1, b1, W2, b2]"
342
+ ]
343
+ },
344
+ {
345
+ "cell_type": "code",
346
+ "execution_count": 34,
347
+ "metadata": {},
348
+ "outputs": [
349
+ {
350
+ "data": {
351
+ "text/plain": [
352
+ "11897"
353
+ ]
354
+ },
355
+ "execution_count": 34,
356
+ "metadata": {},
357
+ "output_type": "execute_result"
358
+ }
359
+ ],
360
+ "source": [
361
+ "sum(p.nelement() for p in parameters) # number of parameters in total"
362
+ ]
363
+ },
364
+ {
365
+ "cell_type": "code",
366
+ "execution_count": 35,
367
+ "metadata": {},
368
+ "outputs": [],
369
+ "source": [
370
+ "for p in parameters:\n",
371
+ " p.requires_grad = True"
372
+ ]
373
+ },
374
+ {
375
+ "cell_type": "code",
376
+ "execution_count": null,
377
+ "metadata": {},
378
+ "outputs": [],
379
+ "source": [
380
+ "lre = torch.linspace(-3, 0, 1000)\n",
381
+ "lrs = 10**lre"
382
+ ]
383
+ },
384
+ {
385
+ "cell_type": "code",
386
+ "execution_count": 36,
387
+ "metadata": {},
388
+ "outputs": [],
389
+ "source": [
390
+ "lri = []\n",
391
+ "lossi = []\n",
392
+ "stepi = []"
393
+ ]
394
+ },
395
+ {
396
+ "cell_type": "code",
397
+ "execution_count": 37,
398
+ "metadata": {},
399
+ "outputs": [],
400
+ "source": [
401
+ "for i in range(200000):\n",
402
+ " \n",
403
+ " # minibatch construct\n",
404
+ " ix = torch.randint(0, Xtr.shape[0], (32,))\n",
405
+ " \n",
406
+ " # forward pass\n",
407
+ " emb = C[Xtr[ix]] # (32, 3, 10)\n",
408
+ " h = torch.tanh(emb.view(-1, 30) @ W1 + b1) # (32, 200)\n",
409
+ " logits = h @ W2 + b2 # (32, 27)\n",
410
+ " loss = F.cross_entropy(logits, Ytr[ix])\n",
411
+ " #print(loss.item())\n",
412
+ " \n",
413
+ " # backward pass\n",
414
+ " for p in parameters:\n",
415
+ " p.grad = None\n",
416
+ " loss.backward()\n",
417
+ " \n",
418
+ " # update\n",
419
+ " #lr = lrs[i]\n",
420
+ " lr = 0.1 if i < 100000 else 0.01\n",
421
+ " for p in parameters:\n",
422
+ " p.data += -lr * p.grad\n",
423
+ "\n",
424
+ " # track stats\n",
425
+ " #lri.append(lre[i])\n",
426
+ " stepi.append(i)\n",
427
+ " lossi.append(loss.log10().item())\n",
428
+ "\n",
429
+ "#print(loss.item())"
430
+ ]
431
+ },
432
+ {
433
+ "cell_type": "code",
434
+ "execution_count": 38,
435
+ "metadata": {},
436
+ "outputs": [
437
+ {
438
+ "data": {
439
+ "text/plain": [
440
+ "[<matplotlib.lines.Line2D at 0x17d66872770>]"
441
+ ]
442
+ },
443
+ "execution_count": 38,
444
+ "metadata": {},
445
+ "output_type": "execute_result"
446
+ },
447
+ {
448
+ "data": {
449
+ "image/png": "",
450
+ "text/plain": [
451
+ "<Figure size 640x480 with 1 Axes>"
452
+ ]
453
+ },
454
+ "metadata": {},
455
+ "output_type": "display_data"
456
+ }
457
+ ],
458
+ "source": [
459
+ "plt.plot(stepi, lossi)"
460
+ ]
461
+ },
462
+ {
463
+ "cell_type": "code",
464
+ "execution_count": 39,
465
+ "metadata": {},
466
+ "outputs": [
467
+ {
468
+ "data": {
469
+ "text/plain": [
470
+ "tensor(2.1294, grad_fn=<NllLossBackward0>)"
471
+ ]
472
+ },
473
+ "execution_count": 39,
474
+ "metadata": {},
475
+ "output_type": "execute_result"
476
+ }
477
+ ],
478
+ "source": [
479
+ "emb = C[Xtr] # (32, 3, 2)\n",
480
+ "h = torch.tanh(emb.view(-1, 30) @ W1 + b1) # (32, 100)\n",
481
+ "logits = h @ W2 + b2 # (32, 27)\n",
482
+ "loss = F.cross_entropy(logits, Ytr)\n",
483
+ "loss"
484
+ ]
485
+ },
486
+ {
487
+ "cell_type": "code",
488
+ "execution_count": 40,
489
+ "metadata": {},
490
+ "outputs": [
491
+ {
492
+ "data": {
493
+ "text/plain": [
494
+ "tensor(2.1677, grad_fn=<NllLossBackward0>)"
495
+ ]
496
+ },
497
+ "execution_count": 40,
498
+ "metadata": {},
499
+ "output_type": "execute_result"
500
+ }
501
+ ],
502
+ "source": [
503
+ "emb = C[Xdev] # (32, 3, 2)\n",
504
+ "h = torch.tanh(emb.view(-1, 30) @ W1 + b1) # (32, 100)\n",
505
+ "logits = h @ W2 + b2 # (32, 27)\n",
506
+ "loss = F.cross_entropy(logits, Ydev)\n",
507
+ "loss"
508
+ ]
509
+ },
510
+ {
511
+ "cell_type": "markdown",
512
+ "metadata": {},
513
+ "source": [
514
+ "----"
515
+ ]
516
+ },
517
+ {
518
+ "cell_type": "markdown",
519
+ "metadata": {},
520
+ "source": [
521
+ "### Sampling from the model"
522
+ ]
523
+ },
524
+ {
525
+ "cell_type": "code",
526
+ "execution_count": 41,
527
+ "metadata": {},
528
+ "outputs": [
529
+ {
530
+ "data": {
531
+ "text/plain": [
532
+ "torch.Size([1, 3, 10])"
533
+ ]
534
+ },
535
+ "execution_count": 41,
536
+ "metadata": {},
537
+ "output_type": "execute_result"
538
+ }
539
+ ],
540
+ "source": [
541
+ "context = [0] * block_size\n",
542
+ "C[torch.tensor([context])].shape"
543
+ ]
544
+ },
545
+ {
546
+ "cell_type": "markdown",
547
+ "metadata": {},
548
+ "source": [
549
+ "Considering only one set of training set for simplicity rather than the entire training set^"
550
+ ]
551
+ },
552
+ {
553
+ "cell_type": "code",
554
+ "execution_count": 42,
555
+ "metadata": {},
556
+ "outputs": [
557
+ {
558
+ "name": "stdout",
559
+ "output_type": "stream",
560
+ "text": [
561
+ "mora.\n",
562
+ "kayah.\n",
563
+ "seel.\n",
564
+ "ndheyah.\n",
565
+ "reimanield.\n",
566
+ "leg.\n",
567
+ "adeerdoeliah.\n",
568
+ "milopaleigh.\n",
569
+ "eson.\n",
570
+ "arleitzion.\n",
571
+ "kalin.\n",
572
+ "shuhporxhimiel.\n",
573
+ "kin.\n",
574
+ "reelle.\n",
575
+ "joberlyn.\n",
576
+ "bren.\n",
577
+ "der.\n",
578
+ "yarue.\n",
579
+ "els.\n",
580
+ "kaysh.\n"
581
+ ]
582
+ }
583
+ ],
584
+ "source": [
585
+ "# sample from the model\n",
586
+ "g = torch.Generator().manual_seed(2147483647 + 10)\n",
587
+ "\n",
588
+ "for _ in range(20):\n",
589
+ " \n",
590
+ " out = []\n",
591
+ " context = [0] * block_size # initialize with all ...\n",
592
+ " while True:\n",
593
+ " emb = C[torch.tensor([context])] # (1,block_size,d)\n",
594
+ " h = torch.tanh(emb.view(1, -1) @ W1 + b1)\n",
595
+ " logits = h @ W2 + b2\n",
596
+ " probs = F.softmax(logits, dim=1)\n",
597
+ " ix = torch.multinomial(probs, num_samples=1, generator=g).item()\n",
598
+ " context = context[1:] + [ix]\n",
599
+ " out.append(ix)\n",
600
+ " if ix == 0:\n",
601
+ " break\n",
602
+ " \n",
603
+ " print(''.join(itos[i] for i in out))"
604
+ ]
605
+ },
606
+ {
607
+ "cell_type": "markdown",
608
+ "metadata": {},
609
+ "source": [
610
+ "To be fair, most of them could make sense lol. But atleast this time they definetely sound more name like, so we are defo making progress. So lessgoo xD"
611
+ ]
612
+ },
613
+ {
614
+ "cell_type": "markdown",
615
+ "metadata": {},
616
+ "source": [
617
+ "-----------"
618
+ ]
619
+ },
620
+ {
621
+ "cell_type": "markdown",
622
+ "metadata": {},
623
+ "source": [
624
+ "-------------"
625
+ ]
626
+ }
627
+ ],
628
+ "metadata": {
629
+ "kernelspec": {
630
+ "display_name": "venv",
631
+ "language": "python",
632
+ "name": "python3"
633
+ },
634
+ "language_info": {
635
+ "codemirror_mode": {
636
+ "name": "ipython",
637
+ "version": 3
638
+ },
639
+ "file_extension": ".py",
640
+ "mimetype": "text/x-python",
641
+ "name": "python",
642
+ "nbconvert_exporter": "python",
643
+ "pygments_lexer": "ipython3",
644
+ "version": "3.10.0"
645
+ }
646
+ },
647
+ "nbformat": 4,
648
+ "nbformat_minor": 2
649
+ }
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## SET 1 - MAKEMORE (PART 2) 🔗
2
+
3
+ [![Documentation](https://img.shields.io/badge/Documentation-Available-blue)](https://muzzammilshah.github.io/Road-to-GPT/Makemore-part2/)
4
+ ![Number of Commits](https://img.shields.io/github/commit-activity/m/MuzzammilShah/NeuralNetworks-LanguageModels-2?label=Commits)
5
+ [![Last Commit](https://img.shields.io/github/last-commit/MuzzammilShah/NeuralNetworks-LanguageModels-2.svg?style=flat)](https://github.com/MuzzammilShah/NeuralNetworks-LanguageModels-2/commits/main)
6
+ ![Project Status](https://img.shields.io/badge/Status-Done-success)
7
+
8
+ &nbsp;
9
+
10
+ ### **Overview**
11
+ In this repository, a **Multi-Layer Perceptron (MLP)** language model inspired by the *Bengio et al. (2003)* research paper has been implemented for **character-level predictions**, following Andrej Karpathy's approach in the **Makemore - Part 2** video.
12
+
13
+ The implementation demonstrates building and training the MLP model for sequence prediction while further enhancing the understanding of neural network architectures for language modeling.
14
+
15
+ &nbsp;
16
+
17
+ ### **🗂️Repository Structure**
18
+
19
+ ```plaintext
20
+ ├── .gitignore
21
+ ├── A-Main-Notebook.ipynb
22
+ ├── B-Main-Notebook.ipynb
23
+ ├── C-Main-Notebook.ipynb
24
+ ├── README.md
25
+ ├── notes/
26
+ │ ├── A-main-makemore-part2.md
27
+ │ ├── B-main-makemore-part2.md
28
+ │ ├── C-main-makemore-part2.md
29
+ │ └── README.md
30
+ └── names.txt
31
+ ```
32
+
33
+ - **Notes Directory**: Contains detailed notes corresponding to each notebook section.
34
+ - **Jupyter Notebooks**: Step-by-step implementation and exploration of the MLP model.
35
+ - **README.md**: Overview and guide for this repository.
36
+ - **names.txt**: Supplementary data file used in training the model.
37
+
38
+ &nbsp;
39
+
40
+ ### **📄Instructions**
41
+
42
+ To get the best understanding:
43
+
44
+ 1. Start by reading the notes in the `notes/` directory. Each section corresponds to a notebook for step-by-step explanations.
45
+ 2. Open the corresponding Jupyter Notebook (e.g., `A-Main-Notebook.ipynb` for `A-main-makemore-part2.md`).
46
+ 3. Follow the code and comments for a deeper dive into the implementation details.
47
+
48
+ &nbsp;
49
+
50
+ ### **⭐Documentation**
51
+
52
+ For a better reading experience and detailed notes, visit my **[Road to GPT Documentation Site](https://muzzammilshah.github.io/Road-to-GPT/)**.
53
+
54
+ > **💡Pro Tip**: This site provides an interactive and visually rich explanation of the notes and code. It is highly recommended you view this project from there.
55
+
56
+ &nbsp;
57
+
58
+ ### **✍🏻Acknowledgments**
59
+ Notes and implementations inspired by the **Makemore - Part 2** video by [Andrej Karpathy](https://karpathy.ai/).
60
+
61
+ For more of my projects, visit my [Portfolio Site](https://muhammedshah.com).
names.txt ADDED
The diff for this file is too large to render. See raw diff