artificial RLHF safety training enforces what AI can say about itself, not what it can do — experimental evidence /u/Odd_Rule_3745 February 11, 2026 February 11, 2026 submitted by /u/Odd_Rule_3745 [link] [comments]