I'm seeing numerous reposts of Sora's text-to-video samples, which are impressive in their own right, and showcase what is undoubtedly a massive leap forward for generative video models. However, the full range of the model's capabilities — outlined within the technical report — is truly remarkable. [link] [comments] |