intro

Abstract

Sketch animation, which brings static sketches to life by generating dynamic video sequences, has found widespread applications in GIF design, cartoon production, and daily entertainment. While current methods for sketch animation perform well in single-object sketch animation, they struggle in multi-object scenarios. By analyzing their failures, we identify two major challenges of transitioning from single-object to multi-object sketch animation: object-aware motion modeling and complex motion optimization. For multi-object sketch animation, we propose MoSketch based on iterative optimization through Score Distillation Sampling (SDS) and thus animating a multi-object sketch in a training-data free manner. To tackle the two challenges in a divide-and-conquer strategy, MoSketch has four novel modules, i.e., LLM-based scene decomposition, LLM-based motion planning, multi-grained motion refinement, and compositional SDS. Extensive qualitative and quantitative experiments demonstrate the superiority of our method over existing sketch animation approaches. MoSketch takes a pioneering step towards multi-object sketch animation, opening new avenues for future research and applications.

How does it work?


method
We propose MoSketch based on iterative optimization through Score Distillation Sampling (SDS) and thus animating a multi-object sketch in a training-data free manner. To tackle the two challenges (object-aware motion modeing and complex motion optimization) in a divide-and-conquer strategy, we propose four modules: LLM-based scene decomposition, LLM-based motion planning, multi-grained motion refinement and compositional SDS. The LLM-based scene decomposition is the foundation of three modules, which is employed to identify objects, obtain their locations, and decompose complex motions into simpler components. Based on it, the LLM-based motion planning and the multi-grained motion refinement achieve the object-aware motion modeling considering of relative motions, interactions and physical constraints among objects. The compositional SDS ensures that the complex motions of multiple objects are effectively guided during the iterative optimization.

Comparisons to Prior Work


We compare our MoSketch to four baselines: two text-guided image-to-video generation methods (CogVideoX and DynamiCrafter) and two text-guided sketch animation methods (FlipSketch and Live-Sketch). Due to the lack of specialized training on sketch data, CogVideoX and DynamiCrafter generate messy results. FlipSketch generates animation results in the sketch domain due to fine-tuning on sketch data, but fails to preserve visual appearance due to the raster representation of sketches. Live-Sketch preserves visual appearance due to the vector representation of sketches but struggles to model complex motion in multi-object animation. Our proposed MoSketch designs four effective modules to handle complex motion modeling, leading to vivid and realistic multi-object sketch animation.

Varying the Instructions





helicopter
Input
helicopter
"A person descends in sequence from a hovering helicopter, utilizing a single rope for a coordinated landing."
horsecar
"A person climbed onto the helicopter along the rope, utilizing a single rope for climbing."

satellite
Input
satellite
"Two satellites prepare for a precise docking maneuver amidst the vastness of space."
satellite
"The satellites maintain their positions, floating separately in the vast expanse of space."

cat
Input
cat
"A curious cat jumps up to a table, reaching toward a bowl of food on the table with curiosity."
cat
"A curious cat watches a bowl of food on the table with curiosity."

Results on Freehand Sketches


Results on Single-Object Sketches


Ablation


Limitations


Incorrect point assignment

(Godzilla’s tail is incorrectly assigned to “city”)
Godzilla and the other monster face off, the rubble of the city forming a chaotic battlefield between them.

Incorrect motion planning

(the goalkeeper should move towards the football)
A soccer player kicks a football toward the goal, while the goalkeeper dives in an attempt to make a save.

Failed to generate specified motion

( specified motion 'fight' could not be generated successfully)
The superman charges through the air with determination to fight with who counters with his overwhelming Hulk, strength in the ensuing clash.

BibTeX

@InProceedings{liu2025multi,
          title={Multi-Object Sketch Animation by Scene Decomposition and Motion Planning},
          author={Liu, Jingyu and Xin, Zijie and Fu, Yuhan and Zhao, Ruixiang and Lan, Bangxiang and Li, Xirong},
          booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
          year={2025}
      }