artificial RLHF safety training enforces what AI can say about itself, not what it can do — experimental evidence /u/Odd_Rule_3745 February 11, 2026 February 11, 2026 submitted by /u/Odd_Rule_3745 [link] [comments] Share this: Share on X (Opens in new window) X Share on Facebook (Opens in new window) Facebook Share on LinkedIn (Opens in new window) LinkedIn Email a link to a friend (Opens in new window) Email Print (Opens in new window) Print