Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit.
Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit.