SiamMask[1] is used to detect and segment objects from videos in each frame, initializing a single bounding box and outputing binary segmentation mask and rotated objects boxes. There are some details of reading and implementing it.
Contents
Paper & Code & note
Paper: Fast Online Object Tracking and Segmentation: A Unifying Approach(CVPR 2019 paper)
Code: PyTorch
Note: SiamMask
Paper
Abstract
As abstract
of the paper, their work mainly dubbed a method called SiamMask, which foucused on VOT(visual object tracking) and semi-supervised VOS(video object segmentation). It improved the offline training
by augmenting loss with a binary segmentatin task.
- It solely relies on a single bounding box initialisation and produces class-agnostic object mask and rotated bounding boxes.
- It yield a solid evidence that SiamMask is a new state of the art among real-time trackers.
Problem Description
It shows the task of SiamMask focused on and the needs for tacking this problem.
Problem Solution
It shows improments on Initialisation and outputs for accuracy.
Conceptual Understanding
It describes the whole architecture of SiamMask with three brach and two branch, which adds mask branch to original siamese network.
Details of implementation
- network architecture: it consists of
backbone
,head
andmask refinement module
.- training: it divides three parts to training respectively, including
FC
,RPN
andsegmentation
.- inference: it evaluated once per frame with
max scores
.
Architecture
More details can be found in paper.
- backbone: it remains the first 4-th stage of
ResNet
, with addingadjust layer
and depth-wisecross-correlated
.- head: The
conv5
block in both variants contains a normalisation layer and ReLU non-linearity whileconv6
only consists of a 1×1 convolutional layer.- refinement: It merges low and high resolution features using multi- ple refinement modules made of
upsampling layers
andskip connections
.
Experiments
Ablation study shows the contributions for VOT.
More experienment results shows below.
Code
The complete code can be found in [SiamMask][2].
Note
some free ideas that orienting future work.
More details of Understanding this work from author can be found in [3].
References
[1] Wang, Qiang, et al. “Fast online object tracking and segmentation: A unifying approach.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
[2] SiamMask. https://github.com/foolwood/SiamMask
[3] Qiang Wang. “Thinking about SiamMask.” https://zhuanlan.zhihu.com/p/58154634