PointNet
1. Introduction
3D Represenation
- Voxel: It seems like that only one dimension adds on 2d vision world, so we can adapt 2d network like CNN as expanding 2d data representation to 3d. However, it has some critical problems like Manhattan world (angle-oriented), Cubic memory problem.
- Point cloud: It is useful representation for expressing 3d vision world. It is related in lidar sensor and depth camera. Also, it is fast and easy to use. However, point cloud data is unordered, unstructured and no connectivity between other points.
-
Mesh: It is natural, but needs template and self-intersection problem like below picutre.
Point cloud representation problems in detail
number1. Unstructured data : no grid, odd distribution
number2. Invariance to Permutation : if point cloud order changed, matrix also changed
number3. Different number of points
number4. Varying density of points
number5. Interaction among points
number6. Missing data and occlusion
number7. Invariance to Transformation : robust on rotation and translation
Deep learning based 3D classification method
-
Multi-view based method : Good performance, but needs many images on single object or view (MVCNN, MHBN, View-GCN)
-
Volumetric based method : Good performance, but computing and memory efficiency problem (3D CNN like VoxNet, ShapeNet, OctNet)
-
Point cloud based method
- Pointwise MLP method : Handle each points independently with several shared MPLs and then aggregate a global feature using a symmetric aggregation function (PointNet 2016, PointNet++ 2017)
- Convolution based method : Compared with kernels defined on 2D grid structures, 3D conv kernels are hard to design because of irregularity of point clouds [separated by kernel type]
[1] 3D continuous convolution method : Take a local subset of points around a certain point as its input (FPS in PointNet++)
[2] 3D discrete convolution method : After changed from non-uniform to uniform transformation, defined convolution kernels on each grid
- Graph based method : Consider each points as a vertex of a graph
- Hierarchical data structure based method
Data file type
2. Related Works
Symmetric Function for Unordered Input
Overcome number2. Invariance to Permutation (Matrix Order) (1) Sort input into a canonical order (2) Treat the input as a squential data like RNN (3) Use a simpple symmetric function like max pooling layer to aggregate the information from each points cf. symmetric function : print output regardless of input data
Local and Global Information Aggregation (Segmentation)
(1) Global Information : Classification (2) Local Information : Segmentation
Joint Alignment Network (T-net)
Robustness on canonical transformation
Leave a comment