RVPO: Risk-Sensitive Alignment via Variance Regularization – Apple Machine Learning Research
RVPO: Risk-Sensitive Alignment via Variance Regularization – Apple Machine Learning Research