SiamMask

SiamMask

SiamMask[1] is used to detect and segment objects from videos in each frame, initializing a single bounding box and outputing binary segmentation mask and rotated objects boxes. There are some details of reading and implementing it.

Contents


Paper & Code & note


Paper: Fast Online Object Tracking and Segmentation: A Unifying Approach(CVPR 2019 paper)
Code: PyTorch
Note: SiamMask

Paper


Abstract

SiamMask_Abstract.png
As abstract of the paper, their work mainly dubbed a method called SiamMask, which foucused on VOT(visual object tracking) and semi-supervised VOS(video object segmentation). It improved the offline training by augmenting loss with a binary segmentatin task.

  1. It solely relies on a single bounding box initialisation and produces class-agnostic object mask and rotated bounding boxes.
  2. It yield a solid evidence that SiamMask is a new state of the art among real-time trackers.

Problem Description

SiamMask_PD.png

It shows the task of SiamMask focused on and the needs for tacking this problem.

Problem Solution

SiamMask_PS.png

It shows improments on Initialisation and outputs for accuracy.

Conceptual Understanding

SiamMask_Schematic.png
SiamMask_CU.png

It describes the whole architecture of SiamMask with three brach and two branch, which adds mask branch to original siamese network.

Details of implementation

SiamMask_Implementation.png

  1. network architecture: it consists of backbone, head and mask refinement module.
  2. training: it divides three parts to training respectively, including FC, RPN and segmentation.
  3. inference: it evaluated once per frame with max scores.

Architecture

More details can be found in paper.

  1. backbone: it remains the first 4-th stage of ResNet, with adding adjust layer and depth-wise cross-correlated.
    SiamMask_Backbone.png
  2. head: The conv5 block in both variants contains a normalisation layer and ReLU non-linearity while conv6 only consists of a 1×1 convolutional layer.
    SiamMask_Head.png
  3. refinement: It merges low and high resolution features using multi- ple refinement modules made of upsampling layers and skip connections.
    SiamMask_Refinement.png
    SiamMask_Example.png

Experiments

Ablation study shows the contributions for VOT.
SiamMask_VOT2016.png
More experienment results shows below.
SiamMask_Results.png

Code


The complete code can be found in [SiamMask][2].

Note


SiamMask_Improvement.png

some free ideas that orienting future work.
More details of Understanding this work from author can be found in [3].

References


[1] Wang, Qiang, et al. “Fast online object tracking and segmentation: A unifying approach.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
[2] SiamMask. https://github.com/foolwood/SiamMask
[3] Qiang Wang. “Thinking about SiamMask.” https://zhuanlan.zhihu.com/p/58154634


  DLTrackingVOT

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×