Abstract

We propose a framework that can deform an object in a 2D image as it exists in 3D space. Most existing methods for 3D-aware image manipulation are limited to (1) only changing the global scene information or depth, or (2) manipulating an object of specific categories. In this paper, we present a 3D-aware image deformation method with minimal restrictions on shape category and deformation type. While our framework leverages 2D-to-3D reconstruction, we argue that reconstruction is not sufficient for realistic deformations due to the vulnerability to topological errors. Thus, we propose to take a supervised learning-based approach to predict the shape Laplacian of the underlying volume of a 3D reconstruction represented as a point cloud. Given the deformation energy calculated using the predicted shape Laplacian and user-defined deformation handles (e.g., keypoints), we obtain bounded biharmonic weights to model plausible handle-based image deformation. In the experiments, we present our results of deforming 2D character and clothed human images. We also quantitatively show that our approach can produce more accurate deformation weights compared to alternative methods (i.e., mesh reconstruction and point cloud Laplacian methods).

Video

Approach

We introduce a neural network that can predict the shape Laplacian of the underlying volume of a 3D point cloud reconstructed from a 2D image — without directly converting the point cloud to a volume. Considering that the deformation energy can be discretized with the standard linear FEM Laplacian LM⁻¹L (where L is a symmetric cotangent Laplacian matrix and M is a diagonal lumped mass matrix), we design our network to learn the matrices L and M⁻¹ from the supervision obtained from a ground truth 3D mesh. The elements in the inverse mass matrix M⁻¹ are predicted for each individual point, while the elements of the cotangent Laplacian matrix L are predicted by taking pairs of the input points. We use a symmetric feature aggregation function for such pairs and also a weight module to enforce the output matrix L to be symmetric and sparse. In test time, we recover the deformation energy from the predicted L and M⁻¹ to compute bounded biharmonic weights with user-specified deformation handles. Since our method learns the shape Laplacian instead of the handle-dependent deformation weights, it can generalize well to arbitrary handle configurations.

Interactive Demo

You can drag keypoints to deform the character image in a 3D-aware manner. Different images can be selected in the side menu. Full screen version is available here.

Paper and Supplementary Material

Jihyun Lee*, Minhyuk Sung*, Hyunjin Kim, Tae-Kyun Kim.
(*: equal contributions)
Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian.
In CVPR, 2022.

[Bibtex]

Acknowledgements

We would like to thank Duygu Ceylan for helpful discussions. This work is in part supported by KAIA grant (22CTAP-C163793-02) funded by the Korea government(MOLIT) and NST grant (CRC21011) funded by the Korea government(MSIT). M. Sung also acknowledges the support by NRF grant (2021R1F1A1045604) funded by the Korea government(MSIT), Technology Innovation Program (20016615) funded by the Korea government(MOTIE), and grants from the Adobe and KT corporations.