Faster R-CNN

2019-10-19 2021-05-10

DeepLearning / Object Detection / Faster R-CNN

4 minutes read (About 639 words) 0 visits

Faster R-CNN[1] is used to detect objects in images, with outputing bounding box and class scores. There are some details of reading and implementing it.

Paper & Code & note

Paper: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(NIPS 2015 paper)
Code: PyTorch
Note: Faster R-CNN

Paper

Abstract

As abstract of the paper, their work mainly proposed a method called Faster R-CNN, which introduced a Region Proposal Network (RPN) and further merge RPN and Fast R-CNN to detect objects.

It introduced a RPN. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. Besides, RPN using the recently popular terminology of neural networks with “attention” mechanisms to generate proposals.

It merged RPN and Fast R-CNN into a single network. The unified network detects objects by sharing their convolutional features enabling nearly cost-free region proposals.

Problem Description

It shows the purpose of Faster R-CNN and exsiting methods about solving this problem.

Problem Solution

It intrudued a network called RPN, including how it works and what it roles.

Conceptual Understanding

It describes the whole architecture of Faster R-CNN, including how it works and what ouputs in each mudules.

Core Conception

It denotes the most important conception of Faster R-CNN mudules, and it explains the Conv layers (conv + relu + pooling), RPN (feature maps -> proposals), RoI pooling (feature maps + proposals -> proposal feature maps), Classification (proposal feature maps -> bbox + cls) respectively.

Besides, the network architecture shows below.

Details of implementation

RPN

anchors: it seleted k(3*3) anchor boxes with outputing 2k scores and 4k coordinates.

classication + regression: it takes RPN outputs as inputs, generating positive anchors and bbox regression.

proposal layers: it contains pre_nms_topN, ignore cross-boundary, NMS, topN to generate proposals.

Experiments

loss function: it considers classification loss and regression loss as loss function.

training: it choosed alternating training, that is to say, RPN -> Fast R-CNN -> RPN2 -> unifiled network(RPN + Fast R-CNN).

Code

The complete code can be found in here with citing faster-rcnn.pytorch[2].

Datasets

default datasets include PASCAL_VOC and COCO files. As my own data, it should transform to VOC or COCO format files.
The details of data format as follows.

`PASCAL_VOC`:
|- VOCdevkit2007
    |- VOC2007
        |- Annotations
            |- .xml
        |- ImageSets
            |- Main
                |- trainval.txt
                |- test.txt
        |- JPEGImages
            |- .jpg
|- VOCdevkit2012
    |- VOC2012
        |- ...

Program improvement

Modified files to be compatible with my own machine.

Changed custom datasets and classes to train.

Note

More details of Faster R-CNN conception about anchors, loss and etc. can be found in [3].

References

[1] Ren, Shaoqing, et al. “Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015.
[2] faster-rcnn.pytorch. https://github.com/jwyang/faster-rcnn.pytorch
[3] Shang Bai. “A paper understanding Faster R-CNN.” https://zhuanlan.zhihu.com/p/31426458.

Title：Faster R-CNN
Author：Gojay
Link：https://gojay.top/2019/10/19/Faster-R-CNN/
Date：2019-10-19
Copyright：All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.

DL, Detection, Faster R-CNN