Mustafa Suleyman says fine-tuning and post-training AI models is now done by AI itself; reinforcement learning from human feedback (RLHF) is becoming reinforcement learning from AI feedback (RLAIF)
Mustafa Suleyman says fine-tuning and post-training AI models is now done by AI itself; reinforcement learning from human feedback (RLHF) is becoming reinforcement learning from AI feedback (RLAIF)