New paper proposes AI alignment "bees" — classifier species that monitor LLMs continuously, can’t be jailbroken, and produce both value and correction
TL;DR: LLMs inherit human failure modes from training data. Current alignment (RLHF, Constitutional AI) faces circularity — biased humans correcting biased models. We propose small classifiers ("bees") running 24/7 as alignment monitors. They…