Improve model card with metadata and link to code (#1)

- Improve model card with metadata and link to code (95acfd7576aa8c3da68d7d5cdbd006d9a9aa6f33)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,6 +1,13 @@
 ---
 license: mit
 ---
-This is the fastText pretraining data filter targeting
-the SciQ task, discussed in the main text of the Perplexity
-Correlations paper: https://arxiv.org/abs/2409.05816

 ---
 license: mit
+library_name: fasttext
+pipeline_tag: text-classification
 ---
+This is the fastText pretraining data filter targeting the SciQ task, discussed in the main text of the Perplexity Correlations paper: https://arxiv.org/abs/2409.05816
+This package can be used to get LLM pretraining data sampling distributions using simple statistical methods. The compute requirements are minimal, and you don't need to train any LLMs yourself.
+Essentially, this approach encourages training on domains where lower loss is very correlated with higher downstream performance. We can use existing and freely available LLMs to do this.
+Code: https://github.com/TristanThrush/perplexity-correlations