Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Written on January 16, 2025