PointNet

1. Introduction

3D Represenation

  • Voxel: It seems like that only one dimension adds on 2d vision world, so we can adapt 2d network like CNN as expanding 2d data representation to 3d. However, it has some critical problems like Manhattan world (angle-oriented), Cubic memory problem.
  • Point cloud: It is useful representation for expressing 3d vision world. It is related in lidar sensor and depth camera. Also, it is fast and easy to use. However, point cloud data is unordered, unstructured and no connectivity between other points.
  • Mesh: It is natural, but needs template and self-intersection problem like below picutre.

Point cloud representation problems in detail

number1. Unstructured data : no grid, odd distribution

number2. Invariance to Permutation : if point cloud order changed, matrix also changed

number3. Different number of points

number4. Varying density of points

number5. Interaction among points

number6. Missing data and occlusion

number7. Invariance to Transformation : robust on rotation and translation

Deep learning based 3D classification method

  • Multi-view based method : Good performance, but needs many images on single object or view (MVCNN, MHBN, View-GCN)

  • Volumetric based method : Good performance, but computing and memory efficiency problem (3D CNN like VoxNet, ShapeNet, OctNet)

  • Point cloud based method

    • Pointwise MLP method : Handle each points independently with several shared MPLs and then aggregate a global feature using a symmetric aggregation function (PointNet 2016, PointNet++ 2017)

    • Convolution based method : Compared with kernels defined on 2D grid structures, 3D conv kernels are hard to design because of irregularity of point clouds [separated by kernel type]

    [1] 3D continuous convolution method : Take a local subset of points around a certain point as its input (FPS in PointNet++)

    [2] 3D discrete convolution method : After changed from non-uniform to uniform transformation, defined convolution kernels on each grid

  • Graph based method : Consider each points as a vertex of a graph
  • Hierarchical data structure based method

Data file type

  • .bin via KITTI velodyne lidar sensor
  • .ply via carla simulator

  • .off via ScannetV2

Symmetric Function for Unordered Input

Overcome number2. Invariance to Permutation (Matrix Order) (1) Sort input into a canonical order (2) Treat the input as a squential data like RNN (3) Use a simpple symmetric function like max pooling layer to aggregate the information from each points cf. symmetric function : print output regardless of input data

Local and Global Information Aggregation (Segmentation)

(1) Global Information : Classification (2) Local Information : Segmentation

Joint Alignment Network (T-net)

Robustness on canonical transformation

3. Network Structure

Whole Network

Classification Part

Segmentation Part

Updated:

Leave a comment