Using AI for Concept visualisation

From Clay to Motion: Learning to Control AI Video with Nano Banana and VEO

The photos of the original sculpture of “the wandering spirit” as I made as a remark for my transforming hiking experiences in my Art retreat in Frigiliana

The scuplture from Nano Banana modeled as a bronze sculpture

A few years ago, I made a small clay sculpture. Raw, angular, hand-shaped. Recently, I used it as a test object to explore something very current: how to gain more control over AI-generated video.

The experiment became a practical dialogue between two tools: Nano Banana for image synthesis and VEO for video.


Step 1 — Teaching the model the object

I started by photographing the sculpture from multiple angles. These images became the “ground truth.”

Using Nano Banana, I prompted the model to reinterpret the clay figure as a bronze statue. This worked surprisingly well. The materiality translated, the form stayed recognizable, and the sculpture gained a coherent metallic presence.

At this stage, the model clearly understood the object.


Step 2 — Asking for motion (and hitting a limit)

What I wanted next was simple in principle:

A clean 360° spin around the statue.

This is where VEO struggled.

VEO is strong at cinematic motion, atmosphere, and narrative flow — but not at geometrically correct rotation around a specific object. The result was a morphing spin. The statue subtly changed shape as the camera moved. Limbs shifted. Proportions drifted.

It looked like motion, but not like a camera orbiting a stable 3D form.

This revealed an important insight:

VEO does not understand the object as a stable 3D volume.

It understands it as a sequence of plausible images.


Step 3 — Using Nano Banana as a “view generator”

To solve this, I went back to Nano Banana.

Instead of asking for motion, I asked for missing views:

  • Slight left
  • Slight right
  • Rear angles
  • Quarter turns

Nano Banana generated these as still images. When laid out together, they resembled a poser sheet — like reference frames for a 3D model.

Individually imperfect. Collectively powerful.


Step 4 — Animating the poser sheet in VEO

These generated angles were then sequenced into VEO as frames for animation.

Now VEO had something it could work with:

not an abstract object, but a series of instructed viewpoints.

The result still isn’t perfect. The spin is not geometrically clean. Some angles morph. But it is far closer to a real orbit than the first attempt.

Because now, the video is guided by structure, not imagination.


Key learning

The breakthrough was realizing this:

To control AI video, you must first teach the model the object from many angles — almost like building a fake 3D model out of 2D views.

Nano Banana became a synthetic photographer.

VEO became the camera operator.


Next experiment

The next test is obvious:

I will photograph the sculpture from many more real angles and feed those into the process.

The hypothesis is that with enough 2D references, the system will begin to behave closer to a 3D CAD understanding — predicting unseen views more reliably and allowing VEO to animate with less morphing.


What this is really about

This is not about a sculpture.

It’s about learning how to structure inputs so AI tools stop guessing and start following.

Less prompting.

More teaching.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *