Faster R-CNN[1] is used to detect objects in images, with outputing bounding box and class scores. There are some details of reading and implementing it.
Contents
Paper & Code & note
Paper: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(NIPS 2015 paper)
Code: PyTorch
Note: Faster R-CNN
Paper
Abstract
As abstract
of the paper, their work mainly proposed a method called Faster R-CNN, which introduced a Region Proposal Network (RPN) and further merge RPN and Fast R-CNN to detect objects.
- It introduced a RPN. An RPN is a fully convolutional network that simultaneously predicts object
bounds
and objectnessscores
at each position. Besides, RPN using the recently popular terminology of neural networks with“attention” mechanisms
to generate proposals.- It merged RPN and Fast R-CNN into a single network. The unified network detects objects by
sharing their convolutional features
enabling nearlycost-free
region proposals.
Problem Description
It shows the purpose of Faster R-CNN and exsiting methods about solving this problem.
Problem Solution
It intrudued a network called RPN, including
how it works
andwhat it roles
.
Conceptual Understanding
It describes the whole architecture of Faster R-CNN, including
how it works
and whatouputs
in each mudules.
Core Conception
It denotes the
most important
conception of Faster R-CNN mudules, and it explains the Conv layers (conv + relu + pooling), RPN (feature maps -> proposals), RoI pooling (feature maps + proposals -> proposal feature maps), Classification (proposal feature maps -> bbox + cls) respectively.
Besides, the network architecture shows below.
Details of implementation
RPN
- anchors: it seleted k(3*3) anchor boxes with outputing 2k scores and 4k coordinates.
- classication + regression: it takes RPN outputs as inputs, generating positive anchors and bbox regression.
- proposal layers: it contains pre_nms_topN, ignore cross-boundary, NMS, topN to generate proposals.
Experiments
- loss function: it considers classification loss and regression loss as loss function.
- training: it choosed
alternating training
, that is to say, RPN -> Fast R-CNN -> RPN2 -> unifiled network(RPN + Fast R-CNN).
Code
The complete code can be found in here with citing faster-rcnn.pytorch[2].
Datasets
default datasets include PASCAL_VOC and COCO files. As my own data, it should transform to VOC or COCO format files.
The details of data format as follows.
`PASCAL_VOC`: |
Program improvement
- Modified files to be compatible with my own machine.
- Changed custom datasets and classes to train.
Note
More details of Faster R-CNN conception about anchors, loss and etc. can be found in [3].
References
[1] Ren, Shaoqing, et al. “Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015.
[2] faster-rcnn.pytorch. https://github.com/jwyang/faster-rcnn.pytorch
[3] Shang Bai. “A paper understanding Faster R-CNN.” https://zhuanlan.zhihu.com/p/31426458.