Here is an example of how one can using a text prompt generated a series of frames, that then are stitched together into a video.
The prompt I used was: “a man walking in the parking lot with a miniature poodle”. the final video generated is shown below.
What is interesting is how it morphs from one to the next, and in some cases, the human starts out more looks like a poodle. It reminds me of the old days of morphing we did in C and C++ (Computer Science theory).
For this I I am playing with the latest build of #StableDuffision and used a max of 100 frames, and for each frames 30 samplings and 200 inference steps.
This below video shows how each of those frame is generated, and it is quite fascinating.