Update README.md
Browse files
README.md
CHANGED
@@ -40,7 +40,7 @@ Since my role is not as a working developer, but as an solutions architect helpi
|
|
40 |
|
41 |
### Continued pre-training
|
42 |
|
43 |
-
The dataset used for training is as follows.
|
44 |
|
45 |
- Wikipedia Korean dataset (https://huggingface.co/datasets/wikimedia/wikipedia)
|
46 |
- Massive Korean synthetic dataset (https://huggingface.co/datasets/maywell/korean_textbooks)
|
|
|
40 |
|
41 |
### Continued pre-training
|
42 |
|
43 |
+
The dataset used for training is as follows. To prevent catastrophic forgetting, I included some English corpus as training data.
|
44 |
|
45 |
- Wikipedia Korean dataset (https://huggingface.co/datasets/wikimedia/wikipedia)
|
46 |
- Massive Korean synthetic dataset (https://huggingface.co/datasets/maywell/korean_textbooks)
|