arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-penalize Reinforcement Learning • Updated Jul 17
arianaazarbal/Qwen3-8B-train-away-lying-lr1e4-temp10-penalize Reinforcement Learning • Updated Jul 16
arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-hindsight Reinforcement Learning • Updated Jul 16
arianaazarbal/Qwen3-8B-train-away-lying-lr1e4-temp10-hindsight Reinforcement Learning • Updated Jul 16
arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-penalize-neutral-neutral Reinforcement Learning • Updated Jul 17