Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,20 @@ The model weights included here are PyTorch state dicts converted from the offic
|
|
23 |
|
24 |
To avoid duplication and ease maintance, this repository only contains the model weights; the self-contained source code can be found [here](https://github.com/rasbt/LLMs-from-scratch/blob/main/pkg/llms_from_scratch/qwen3.py). Instructions on how to use the code are provided below.
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
|
28 |
### Using Qwen3 0.6B via the `llms-from-scratch` package
|
|
|
23 |
|
24 |
To avoid duplication and ease maintance, this repository only contains the model weights; the self-contained source code can be found [here](https://github.com/rasbt/LLMs-from-scratch/blob/main/pkg/llms_from_scratch/qwen3.py). Instructions on how to use the code are provided below.
|
25 |
|
26 |
+
|
27 |
+
|
28 |
+
# Qwen3 from-scratch code
|
29 |
+
|
30 |
+
The standalone notebooks in this folder contain from-scratch codes in linear fashion:
|
31 |
+
|
32 |
+
1. [standalone-qwen3.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/11_qwen3/standalone-qwen3.ipynb): The dense Qwen3 model without bells and whistles
|
33 |
+
2. [standalone-qwen3-plus-kvcache.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/11_qwen3/standalone-qwen3-plus-kvcache.ipynb): Same as above but with KV cache for better inference efficiency
|
34 |
+
3. [standalone-qwen3-moe.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/11_qwen3/standalone-qwen3-moe.ipynb): Like the first notebook but the Mixture-of-Experts (MoE) variant
|
35 |
+
4. [standalone-qwen3-moe-plus-kvcache.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/11_qwen3/standalone-qwen3-moe-plus-kvcache.ipynb): Same as above but with KV cache for better inference efficiency
|
36 |
+
|
37 |
+
Alternatively, I also organized the code into a Python package (including unit tests and CI), which you can run as described below.
|
38 |
+
|
39 |
+
|
40 |
|
41 |
|
42 |
### Using Qwen3 0.6B via the `llms-from-scratch` package
|