💡 DICE - a sail Collection

sail 's Collections

🚀 Active PRM

🌾Oat-Zero: Understanding R1-Zero-Like Training

🔱 Sailor2 Language Models

🧬 RegMix: Data Mixture as Regression

📈 Scaling Laws with Vocabulary

⚓️ Sailor Language Models

💡 DICE

updated Jul 28, 2024

Self-alignment with DPO Implicit Rewards

Bootstrapping Language Models with DPO Implicit Rewards

Paper • 2406.09760 • Published Jun 14, 2024 • 41
sail/Llama-3-Base-8B-DICE-Iter1

Text Generation • 8B • Updated Mar 11 • 3 • 2
sail/Llama-3-Base-8B-DICE-Iter2

Text Generation • 8B • Updated Mar 11 • 4 • 3
sail/Zephyr-7B-DICE-Iter1

Text Generation • 7B • Updated Mar 11 • 5
sail/Zephyr-7B-DICE-Iter2

Text Generation • 7B • Updated Mar 11 • 3 • 2