What AI generated video is exactly?

I was wondering what AI video is exactly and how does it work. I'm aware it sounds silly at first, but from development standpoint, I can't grasp my head around it.

Let's imagine I have a prompt "man is jogging in a beach";

What does AI video generator do exactly?

If video is just a sequence of images in time, does the AI first generate first image at random and uses that image as a reference for the next image and just adds slight changes in position of a jogging man?

Is that how AI keeps consistent clothing, skin tone and so on for the scene?

I'm happy to read through ALL of your provided information sources, so please share it if you can!

Thanks a lot!

submitted by /u/Apprehensive_Bag9364
[link] [comments]