ShapeFusion: A 3D diffusion model for localized shape editing

Imperial College London, United Kingdom


Abstract

In the realm of 3D computer vision, parametric models have emerged as a ground-breaking methodology for the creation of realistic and expressive 3D avatars. Traditionally, they rely on Principal Component Analysis (PCA), given its ability to decompose data to an orthonormal space that maximally captures shape variations. However, due to the orthogonality constraints and the global nature of PCA's decomposition, these models struggle to perform localized and disentangled editing of 3D shapes, which severely affects their use in applications requiring fine control such as face sculpting. In this paper, we leverage diffusion models to enable diverse and fully localized edits on 3D meshes, while completely preserving the un-edited regions. We propose an effective diffusion masking training strategy that, by design, facilitates localized manipulation of any shape region, without being limited to predefined regions or to sparse sets of predefined control vertices. Following our framework, a user can explicitly set their manipulation region of choice and define an arbitrary set of vertices as handles to edit a 3D mesh. Compared to the current state-of-the-art our method leads to more interpretable shape manipulations than methods relying on latent code state, greater localization and generation diversity while offering faster inference than optimization based approaches.

Overview

We propose a masked 3D diffusion model for localized attribute manipulation and editing. During forward diffusion step, noise is gradually added to random regions of the mesh, indicated by a mask M. In the denoising step, a hierarchical network based on mesh convolution is used to learn a prior distribution of each attribute directly on the vertex space.

method

In order to train a localized model we define a masked forward diffusion process that gradually adds noise to specific areas of the mesh as defined by a mask M. During training, we define the masked vertices as the k-hop geodesic neighborhood of a randomly selected anchor point. The remaining vertices, including the anchor point, remain unaffected. Using this masked diffusion process we guarantee local editing as well as full control of the generative process without employing an explicit conditional model. In contrast to the previous methods for disentangled manipulation, our approach not only ensures fully localized editing but also enables direct manipulation of any point and region of the mesh.

Results

In contrast to previous approaches, ShapeFusion can achieve highly localized manipulations explicitly defined by the user. Our masked diffusion approach enables edits that preserve the un-masked regions and the shape identity. In addition to shape manipulations, ShapeFusion can be used to manipulate localized expressions that follow the Facial Action Coding System (FACS), introducing a powerful tool for arbitrary size region editing. As shown in the following figure, ShapeFusion can not only edit global extreme expressions, similar to global parametric models, but can also generalize to out-of-distribution localized expressions, such as the smirk.

A practical property of the proposed method is its ability to seamlessly swap distinct facial regions and components between different identities.

In addition, ShapeFusion retains the generative properties of 3DMMs and can be easily adapted to reconstruction and fitting tasks.

method

BibTeX

@article{potamias2024shapefusion,
      title={ShapeFusion: A 3D diffusion model for localized shape editing},
      author={Potamias, Rolandos Alexandros and Ploumpis, Michail Tarasiou Stylianos and Zafeiriou, Stefanos},
      journal={arXiv preprint arXiv:2403.19773},
      year={2024}
    }