PV-RCNN

PV-RCNN

PV-RCNN[1] is a 3D Object Detection framework to integrate 3D voxel CNN and PointNet-based set abstraction to learn more discriminative point cloud features. The most contributions in this papar is two-stage strategy including the voxel-to-keypoint 3D scene encoding and the keypoint-to-grid RoI feature abstraction. There are some details of reading and implementing it.

Contents


Paper & Code & note


Paper: PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection(CVPR 2020 paper)
Code: PyTorch
Note: Mendeley

Paper


Abstract

PV-RCNN_Abstract.png

  1. They present PointVoxel-RCNN(PV-RCNN) for accurate 3D object detection from point clouds.
  2. It summarizes the 3D scene with a 3D voxel CNN into a small set of keypoints via a novel voxel set abstraction(VSA) module.
  3. RoI-grid pooling is proposed to abstract proposal-specific features from the keypoints to the RoI-grid points, the RoI-grid feature points encode much richer context information.
  4. It surpasses state-of-the-art 3D detection.

Problem Description

PV-RCNN_PD.png

  1. The grid-based methods generally transform the irregular point clouds to regular representations such as 3D voxels, they are more computationally efficient.
  2. The point-based methods directly extract discriminative features from raw point clouds for 3D detection, they could achieve larger receptive field.

Problem Solution

PV-RCNN_PS.png

  1. They integrated these two types. The voxel-based operation efficiently encodes multi-scale feature representations, PointNet-based set abstraction operation preserves accurate location information with flexible receptive field.
  2. The voxel CNN with 3D sparse convolution is adopted for voxel-wise feature learning and accurate proposal generation.
  3. A small set of keypoints are selected by the furtherest point sampling (FPS) to summarize the overall 3D information from the voxel-wise features.
  4. PointNet-based set abstraction for summarizing multi-scale point cloud information.

Conceptual Understanding

PV-RCNN_framework.png
PV-RCNN_overall.png

  1. 3D Sparse Convolution: Input the raw point clouds to learn multi-scale semantic features and generate 3D object proposals.
  2. Voxel Set Abstraction: the learned voxel-wise feature volumes at multiple neural layers are summarized into a small set of key points.
  3. RoI-grid Pooling: the keypoint features are aggregated to the RoI-grid points.

Core Conception

Predicted Keypoint Weighting

PV-RCNN_PKW.png

RoI-grid Pooling

PV-RCNN_RoI-grid.png

Experiments

PV-RCNN_KITTI.png
PV-RCNN_val.png
PV-RCNN_WaymoOpen.png

Code


  1. The complete code can be found in PV-RCNN[2].
  2. Another implementation can be found in vision3d[3].

[Updating]

Note


  1. Provide more accurate detections by point cloud features.
  2. Integrate it to multiple object tracking framework.

References


[1] Shi S, Guo C, Jiang L, et al. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10529-10538.
[2] PV-RCNN. https://github.com/sshaoshuai/PV-RCNN
[3] vision3d. https://github.com/jhultman/vision3d


Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×