SiamMask

2019-11-26 2021-05-10

DeepLearning / Object Tracking / SiamMask

4 minutes read (About 592 words) 0 visits

SiamMask[1] is used to detect and segment objects from videos in each frame, initializing a single bounding box and outputing binary segmentation mask and rotated objects boxes. There are some details of reading and implementing it.

Paper & Code & note

Paper: Fast Online Object Tracking and Segmentation: A Unifying Approach(CVPR 2019 paper)
Code: PyTorch
Note: SiamMask

Paper

Abstract

As abstract of the paper, their work mainly dubbed a method called SiamMask, which foucused on VOT(visual object tracking) and semi-supervised VOS(video object segmentation). It improved the offline training by augmenting loss with a binary segmentatin task.

It solely relies on a single bounding box initialisation and produces class-agnostic object mask and rotated bounding boxes.

It yield a solid evidence that SiamMask is a new state of the art among real-time trackers.

Problem Description

It shows the task of SiamMask focused on and the needs for tacking this problem.

Problem Solution

It shows improments on Initialisation and outputs for accuracy.

Conceptual Understanding

It describes the whole architecture of SiamMask with three brach and two branch, which adds mask branch to original siamese network.

Details of implementation

network architecture: it consists of backbone, head and mask refinement module.

training: it divides three parts to training respectively, including FC, RPN and segmentation.

inference: it evaluated once per frame with max scores.

Architecture

More details can be found in paper.

backbone: it remains the first 4-th stage of ResNet, with adding adjust layer and depth-wise cross-correlated.

head: The conv5 block in both variants contains a normalisation layer and ReLU non-linearity while conv6 only consists of a 1×1 convolutional layer.

refinement: It merges low and high resolution features using multi- ple refinement modules made of upsampling layers and skip connections.

Experiments

Ablation study shows the contributions for VOT.

More experienment results shows below.

Code

The complete code can be found in [SiamMask][2].

Note

some free ideas that orienting future work.
More details of Understanding this work from author can be found in [3].

References

[1] Wang, Qiang, et al. “Fast online object tracking and segmentation: A unifying approach.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
[2] SiamMask. https://github.com/foolwood/SiamMask
[3] Qiang Wang. “Thinking about SiamMask.” https://zhuanlan.zhihu.com/p/58154634

Title：SiamMask
Author：Gojay
Link：https://gojay.top/2019/11/26/SiamMask/
Date：2019-11-26
Copyright：All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.

DL, Tracking, VOT