ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering – Apple Machine Learning Research
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering Apple Machine Learning Research