Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
sail 's Collections
🚀 Active PRM
🌾Oat-Zero: Understanding R1-Zero-Like Training
🔱 Sailor2 Language Models
🧬 RegMix: Data Mixture as Regression
📈 Scaling Laws with Vocabulary
💡 DICE
⚓️ Sailor Language Models

🧬 RegMix: Data Mixture as Regression

updated Jul 26, 2024

Automatic data mixture method for large language model pre-training

Upvote
8

  • Running
    6
    6

    RegMix

    📚

    Generate regression predictions from CSV data


  • RegMix: Data Mixture as Regression for Language Model Pre-training

    Paper • 2407.01492 • Published Jul 1, 2024 • 41

  • sail/data-mixture-human-1b

    Text Generation • Updated Jul 11, 2024 • 3 • 3

  • sail/data-mixture-pile-cc-1b

    Text Generation • Updated Jul 11, 2024 • 3 • 3

  • sail/data-mixture-regmix-1b

    Text Generation • Updated Jul 11, 2024 • 7 • 2

  • sail/data-mixture-doremi-1b

    Text Generation • Updated Jul 11, 2024 • 4 • 2

  • sail/data-mixture-random-1b

    Text Generation • Updated Jul 11, 2024 • 5 • 4

  • sail/regmix-data-sample

    Viewer • Updated Jul 11, 2024 • 698k • 261 • 2

  • sail/regmix-data

    Viewer • Updated Sep 12, 2024 • 13.7M • 9.71k • 4
Upvote
8
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets OCR模型免费转Markdown Pricing 模型下载攻略