Visual Computing and Learning

PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes

IEEE Conference on Computer Vision and Pattern Recognition 2020

"The characterization of object perception provided by recognition-by-components (RBC) bears a close resemblance to some current views as to how speech is perceived."
— Irving Biederman [5]


We introduce PQ-NET, a deep neural network which represents and generates 3D shapes via sequential part assembly. The input to our network is a 3D shape segmented into parts, where each part is first encoded into a feature representation using a part autoencoder. The core component of PQ-NET is a sequence-to-sequence or Seq2Seq autoencoder which encodes a sequence of part features into a latent vector of fixed size, and the decoder reconstructs the 3D shape, one part at a time, resulting in a sequential assembly. The latent space formed by the Seq2Seq encoder encodes both part structure and fine part geometry. The decoder can be adapted to perform several generative tasks including shape autoencoding, interpolation, novel shape generation, and single-view 3D reconstruction, where the generated shapes are all composed of meaningful parts.

Fig 1:Our network, PQ-NET, learns 3D shape representations as a sequential part assembly. It can be adapted to generative tasks such as random 3D shape generation, single-view 3D reconstruction (from RGB or depth images), and shape completion.

Fig 4: Visual results for shape auto-encoding. Output meshes are obtained using the same marching cubes setup.

Fig 5: 3D shape generation results with comparison to results obtained by IM-NET and StructureNET.

Fig 6:Latent space interpolation results. The interpolated sequence not only consists of smooth geometry morphing but also keeps the shape structure.

Fig 7: Visual comparison of structured 3D shape reconstruction from single depth image on three categories: chair, table, lamp.

Fig 8: Visual comparison of random generated 3D primitives. 3D-PRNN suffers from unreal, duplicated or missing parts while our model can yield more plausible results.

Fig 9: Single view reconstruction results. Our results are from model that is trained across all three category. Note that our method also recovers the shape structure.

Fig 10: Part order denoising results. Our method can unscramble random input orders into a consistent output order, to facilitate part correspondence. Note that the color correspondence is for illustrations only, and not part of the output from our network.