Multi-Object Sketch Animation by Scene Decomposition and Motion Planning

Jingyu Liu, Zijie Xin, Yuhan Fu, Ruixiang Zhao, Bangxiang Lan, Xirong Li^†

Renmin University of China

ICCV 2025

Abstract Gallery Method Comparisons Varying the Instructions Results on Freehand Sketches Results on Single-Object Sketches Ablation Limitations

Go to Top

Abstract

Sketch animation, which brings static sketches to life by generating dynamic video sequences, has found widespread applications in GIF design, cartoon production, and daily entertainment. While current methods for sketch animation perform well in single-object sketch animation, they struggle in multi-object scenarios. By analyzing their failures, we identify two major challenges of transitioning from single-object to multi-object sketch animation: object-aware motion modeling and complex motion optimization. For multi-object sketch animation, we propose MoSketch based on iterative optimization through Score Distillation Sampling (SDS) and thus animating a multi-object sketch in a training-data free manner. To tackle the two challenges in a divide-and-conquer strategy, MoSketch has four novel modules, i.e., LLM-based scene decomposition, LLM-based motion planning, multi-grained motion refinement, and compositional SDS. Extensive qualitative and quantitative experiments demonstrate the superiority of our method over existing sketch animation approaches. MoSketch takes a pioneering step towards multi-object sketch animation, opening new avenues for future research and applications.

Go to Top

Gallery

Go to Top

How does it work?

We propose MoSketch based on iterative optimization through Score Distillation Sampling (SDS) and thus animating a multi-object sketch in a training-data free manner. To tackle the two challenges (object-aware motion modeing and complex motion optimization) in a divide-and-conquer strategy, we propose four modules: LLM-based scene decomposition, LLM-based motion planning, multi-grained motion refinement and compositional SDS. The LLM-based scene decomposition is the foundation of three modules, which is employed to identify objects, obtain their locations, and decompose complex motions into simpler components. Based on it, the LLM-based motion planning and the multi-grained motion refinement achieve the object-aware motion modeling considering of relative motions, interactions and physical constraints among objects. The compositional SDS ensures that the complex motions of multiple objects are effectively guided during the iterative optimization.

Go to Top

Comparisons to Prior Work

We compare our MoSketch to four baselines: two text-guided image-to-video generation methods (CogVideoX and DynamiCrafter) and two text-guided sketch animation methods (FlipSketch and Live-Sketch). Due to the lack of specialized training on sketch data, CogVideoX and DynamiCrafter generate messy results. FlipSketch generates animation results in the sketch domain due to fine-tuning on sketch data, but fails to preserve visual appearance due to the raster representation of sketches. Live-Sketch preserves visual appearance due to the vector representation of sketches but struggles to model complex motion in multi-object animation. Our proposed MoSketch designs four effective modules to handle complex motion modeling, leading to vivid and realistic multi-object sketch animation.

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Go to Top

Varying the Instructions

Input

"The cyclist overtakes the bus."

"The bus and the cyclist travels foward."

"The Bus follows the cyclist."

Input

"A ufo is flying into the sky."

"Viewed from the front, a UFO over the ground is getting closer."

"Viewed from the back, a UFO over the ground is getting farther away."

Input

"A person leads the horse gently by the reins, while the rider sits calmly, observing the surroundings as they move forward."

"A person is walking ahead, while a rider riding a horse fast and overtaking the person."

"A person is walking ahead, while a rider dismounting from a standing horse."

Input

"A person descends in sequence from a hovering helicopter, utilizing a single rope for a coordinated landing."

"A person climbed onto the helicopter along the rope, utilizing a single rope for climbing."

Input

"Two satellites prepare for a precise docking maneuver amidst the vastness of space."

"The satellites maintain their positions, floating separately in the vast expanse of space."

Input

"A curious cat jumps up to a table, reaching toward a bowl of food on the table with curiosity."

"A curious cat watches a bowl of food on the table with curiosity."

Go to Top

Results on Freehand Sketches

Go to Top

Results on Single-Object Sketches

Go to Top

Ablation

Input

No Motion Plan

No Object Refinement

No Point Refinement

No Object-aware Network

No CSDS

Ours

Go to Top

Limitations

Incorrect point assignment

(Godzilla’s tail is incorrectly assigned to “city”)

Input

Point Assignment

Output

Godzilla and the other monster face off, the rubble of the city forming a chaotic battlefield between them.

Incorrect motion planning

(the goalkeeper should move towards the football)

Input

Coarse Object Motion

Output

A soccer player kicks a football toward the goal, while the goalkeeper dives in an attempt to make a save.

Failed to generate specified motion

( specified motion 'fight' could not be generated successfully)

Input

Coarse Object Motion

Output

The superman charges through the air with determination to fight with who counters with his overwhelming Hulk, strength in the ensuing clash.

BibTeX

@InProceedings{liu2025multi,
          title={Multi-Object Sketch Animation by Scene Decomposition and Motion Planning},
          author={Liu, Jingyu and Xin, Zijie and Fu, Yuhan and Zhao, Ruixiang and Lan, Bangxiang and Li, Xirong},
          booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
          year={2025}
      }

Multi-Object Sketch Animation by Scene Decomposition and Motion Planning

Abstract

Gallery

A jet takes off from the aircraft carrier [..]

The drummer pounds the cymbals, [..]

A stealthy cat crouches low, its eye[..]

Viewed from the front, an airplane [..]

The player soars through the air [..]

Two playful cats stand on their hind legs, [..]

A shell bursts from the cannon, [..]

A climber dangles mid-air, navigating[..]

A curious cat jumps up to a table, reach[..]

Viewed from the top, the car in the back[..]

A climber ascends a rock wall, dynamical[..]

The person throws a frisbee through [..]

Viewed from the back, a car navigate[..]

A woman and a man sitting at a dining [..]

The dog prepares to enjoy a delightful[..]

The road curves ahead as a jeep viewed fr[..]

A man gently feeds a woman across a [..]

The dog reaches up to the table, [..]

A lone car maneuvers with steep cliffs [..]

A man pulls himself up on a horizontal [..]

The dog races up the stairs in pursuit [..]

Two cyclists maneuver a curvy road, [..]

The scene depicts two individuals engag[..]

The seal on a stage prepares to leap [..]

The motorcycle will jump over [..]

The boxer throws a powerful punch [..]

The seal seats on a stage, juggling [..]

A person holding a spoonful of food clo[..]

A woman throws a frisbee with force [..]

The dolphin is mid-leap, heading [..]

An excavator digging and scooping soil, [..]

A soccer player executes an [..]

The dolphin is jumping through the hoop [..]

A large crane lowers a container on[..]

A soccer player skillfully dribbles [..]

The eagle is in pursuit of the small[..]

Two people rappel down a rope from [..]

Three athletes in action during a hurdle [..]

The larger fish opens its mouth wide [..]

Ice cubes splash into a glass of liquid, [..]

Three children enjoying a jump rope activity: [..]

The giraffe is approaching the [..]

A bottle is gracefully pouring liquid in[..]

A worker climbing an inclined ladd[..]

A goat grazes peacefully on the [..]

Rollercoaster cart at the peak of a [..]

A bride and groom stand facing [..]

Godzilla and the other monster face off, [..]

The satellite shifts position as the [..]

A person carefully stacking box[..]

A horse-drawn carriage moves stead[..]

The space shuttle begins its ascent, [..]

A person sits alone, holding a fork [..]

A person leads the horse gently by [..]

A tank is firing shells towards a [..]

Superman flies swiftly through the air, [..]

The horse gracefully soars over the obs[..]

A vintage steam locomotive with [..]

The workers in unison: one shovels debris, [..]

The tiger leaps through the air, claws [..]

How does it work?

Comparisons to Prior Work

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter

FlipSketch

Live-Sketch

Ours

Input

CogVideoX

DynamiCrafter