Can Nano Banana actually make entire Animations? Spoiler alert: yes—and they’re mind-blowing

Can Nano Banana actually make entire Animations? Spoiler alert: yes—and they’re mind-blowing

Google recently unveiled a groundbreaking AI image generation technology that has the potential to reshape entire industries. While the broader implications are still unfolding, one particular feature stands out:

the ability to make precise, localized changes to an image without altering any other part of it.

This means you can, for example, change the background color of a picture without a person in the foreground suddenly growing a sixth finger or a random monkey appearing in the scene. This level of control opens up a world of creative possibilities.

While people have already found various uses for this, I was particularly interested in challenging this model to do one thing:

The challenge: Can we create consistent animations as a series of images?

From Static Images to Dynamic Scenes

Imagine the potential if we could apply this precise editing to a sequence of images. We could:

  • Illustrate complex behaviors with unprecedented accuracy.
  • Simulate dynamic systems for educational or marketing purposes.
  • Create detailed visual guides
  • and much, much more.

But Wait... Don't We Already Have This?

At this point, you're probably thinking, "Wait a second. Don't we already have amazing video and image generators?"

Well, yes, but they come with some serious hang-ups that make this kind of progressive animation a real headache.

The Problem with Video Generators:

  • They're like using a sledgehammer when you need a scalpel. Making one tiny change forces you to re-render the entire, costly video.
  • They completely lack the surgical precision needed to fine-tune small details without messing up the rest of the scene, though they are becoming much better and more consistent than a few months ago.

The Problem with Standard Image Generators:

  • They suffer from a crippling consistency problem. Ask for a sequence of images, and you'll get a chaotic mess where details change randomly from one frame to the next.
  • Even image editors like Flux Context produce quite good results, but they almost never make solely local changes (unlike Nano Banana).

Thanks to Nanao Banana and a small tweak, we can now create animations that are more realistic and consistent than ever before.

How it works

We leverage the power of Nanao banana in the following ways to achieve this:

  1. First, the user provides a description of the scene they want to animate.
  2. Then, the AI generates a suggestion for the dynamics of the image, planning out the sequence of changes.
  3. Nano Bannana doesn't just get the target for the next frame. It also sees what has been generated so far. This keeps the entire sequence consistent and allows it to progressively build the scene, one image at a time.

Sounds good? Here’s what it looks like in action.

Examples

Here is an animation of a person doing jumping jacks, created using the prompt "Woman doing jumping jacks in 5 steps".

Jumping Jack

Here is another one showing the evolution of the beautiful city of Paris over the last 200 years in five frames. There can be many more frames if you like.

Paris: A 200-Year Transformation in 5 Frames

Here is the full animation:

Making Matcha

Here is the generated prompt for the matcha animation. The last couple of steps were unfortunately a bit misaligned, which is why the final two frames look a bit strange, but the rest of the sequence is quite close to the goal.

{
  "target": "The full preparation sequence of a matcha drink, beginning with dry powder and culminating in a perfectly frothed bowl of tea.",
  "progressionType": "linear",
  "phases": [
    "Scooping and adding matcha powder to the bowl",
    "Pouring hot water into the bowl",
    "Gently whisking to dissolve the powder",
    "Vigorously whisking to create a frothy foam",
    "Final presentation of the frothed matcha"
  ],
  "topic": "Making matcha drink - 5 steps"
}

There are so many other scenarios that would go beyond the scope of this post.

Now, Let's Take This to the Next Level

This is just scratching the surface. Imagine what becomes possible when you can integrate your own content:

  • Generate animations starting with your own images (which you can already upload).
  • Create an initial prompt using your own documents.
  • Animate your own products.
  • Automatically generate descriptions for your images.

Imagine the possibilities:

  • For education: explaining complex behavior.
  • For demonstrations: showing how a car is built progressively.
  • For content creation: making entire posts just from a series of images.

You get the point.

With these capabilities and a few tweaks, we can start building visual content that is much more reliable and closer to reality than ever before. And with "nano banba," we're just getting a taste. Google has just opened the battlefield, and models like seedream and others are already joining the game. The future is going to be crazy!

Happy creation!

Data Privacy | Imprint