PV-RCNN

2020-06-23 2021-05-10

DeepLearning / 3D Object Dedection / PV-RCNN

4 minutes read (About 650 words) 0 visits

PV-RCNN[1] is a 3D Object Detection framework to integrate 3D voxel CNN and PointNet-based set abstraction to learn more discriminative point cloud features. The most contributions in this papar is two-stage strategy including the voxel-to-keypoint 3D scene encoding and the keypoint-to-grid RoI feature abstraction. There are some details of reading and implementing it.

Paper & Code & note

Paper: PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection(CVPR 2020 paper)
Code: PyTorch
Note: Mendeley

Paper

Abstract

They present PointVoxel-RCNN(PV-RCNN) for accurate 3D object detection from point clouds.

It summarizes the 3D scene with a 3D voxel CNN into a small set of keypoints via a novel voxel set abstraction(VSA) module.

RoI-grid pooling is proposed to abstract proposal-specific features from the keypoints to the RoI-grid points, the RoI-grid feature points encode much richer context information.

It surpasses state-of-the-art 3D detection.

Problem Description

The grid-based methods generally transform the irregular point clouds to regular representations such as 3D voxels, they are more computationally efficient.

The point-based methods directly extract discriminative features from raw point clouds for 3D detection, they could achieve larger receptive field.

Problem Solution

They integrated these two types. The voxel-based operation efficiently encodes multi-scale feature representations, PointNet-based set abstraction operation preserves accurate location information with flexible receptive field.

The voxel CNN with 3D sparse convolution is adopted for voxel-wise feature learning and accurate proposal generation.

A small set of keypoints are selected by the furtherest point sampling (FPS) to summarize the overall 3D information from the voxel-wise features.

PointNet-based set abstraction for summarizing multi-scale point cloud information.

Conceptual Understanding

3D Sparse Convolution: Input the raw point clouds to learn multi-scale semantic features and generate 3D object proposals.

Voxel Set Abstraction: the learned voxel-wise feature volumes at multiple neural layers are summarized into a small set of key points.

RoI-grid Pooling: the keypoint features are aggregated to the RoI-grid points.

Core Conception

Predicted Keypoint Weighting

RoI-grid Pooling

Experiments

Code

The complete code can be found in PV-RCNN[2].

Another implementation can be found in vision3d[3].

[Updating]

Note

Provide more accurate detections by point cloud features.

Integrate it to multiple object tracking framework.

References

[1] Shi S, Guo C, Jiang L, et al. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10529-10538.
[2] PV-RCNN. https://github.com/sshaoshuai/PV-RCNN
[3] vision3d. https://github.com/jhultman/vision3d

Title：PV-RCNN
Author：Gojay
Link：https://gojay.top/2020/06/23/PV-RCNN/
Date：2020-06-23
Copyright：All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.

DL, Detection-3D, PV-RCNN

PV-RCNN

Contents

Paper & Code & note

Paper

Abstract

Problem Description

Problem Solution

Conceptual Understanding

Core Conception

Predicted Keypoint Weighting

RoI-grid Pooling

Experiments

Code

Note

References

Comments

Catalogue

Your browser is out-of-date!