PV-RCNN[1] is a 3D Object Detection framework to integrate
3D voxel CNN
andPointNet-based set abstraction
to learn more discriminative point cloud features. The most contributions in this papar is two-stage strategy including thevoxel-to-keypoint
3D scene encoding and thekeypoint-to-grid
RoI feature abstraction. There are some details of reading and implementing it.
Contents
Paper & Code & note
Paper: PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection(CVPR 2020 paper)
Code: PyTorch
Note: Mendeley
Paper
Abstract
- They present PointVoxel-RCNN(PV-RCNN) for accurate 3D object detection from point clouds.
- It summarizes the 3D scene with a 3D voxel CNN into a small set of
keypoints
via a novel voxel set abstraction(VSA) module.- RoI-grid pooling is proposed to abstract proposal-specific features from the keypoints to the RoI-
grid points
, the RoI-grid feature points encode much richer context information.- It surpasses state-of-the-art 3D detection.
Problem Description
- The
grid-based methods
generally transform the irregular point clouds to regular representations such as 3D voxels, they are more computationally efficient.- The
point-based methods
directly extract discriminative features from raw point clouds for 3D detection, they could achieve larger receptive field.
Problem Solution
- They integrated these two types. The
voxel-based operation
efficiently encodes multi-scale feature representations,PointNet-based set abstraction operation
preserves accurate location information with flexible receptive field.- The voxel CNN with
3D sparse convolution
is adopted for voxel-wise feature learning and accurate proposal generation.- A small set of
keypoints
are selected by the furtherest point sampling (FPS) to summarize the overall 3D information from the voxel-wise features.PointNet-based set abstraction
for summarizing multi-scale point cloud information.
Conceptual Understanding
- 3D Sparse Convolution: Input the
raw point clouds
to learnmulti-scale semantic features
and generate3D object proposals
.- Voxel Set Abstraction: the learned
voxel-wise feature
volumes at multiple neural layers are summarized into a small set ofkey points
.- RoI-grid Pooling: the
keypoint
features are aggregated to the RoI-grid points
.
Core Conception
Predicted Keypoint Weighting
RoI-grid Pooling
Experiments
Code
[Updating]
Note
- Provide more
accurate detections
by point cloud features.- Integrate it to
multiple object tracking
framework.
References
[1] Shi S, Guo C, Jiang L, et al. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10529-10538.
[2] PV-RCNN. https://github.com/sshaoshuai/PV-RCNN
[3] vision3d. https://github.com/jhultman/vision3d