artificial

Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit.

/u/MetaKnowing

July 12, 2025 July 12, 2025

Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit.

Paper: https://machine-bullshit.github.io/

submitted by /u/MetaKnowing
[link] [comments]