School of Reward Hacks Hacking harmless tasks generalizes to misaligned behavior in LLMs

Paper Content

Click the button to extract keywords

Click the button to extract insights