TSDM

TSDM

TSDM[1] is a RGB-D tracker which use depth information to pretreatment and fuse information to pro-processing. It is composed of a Mask-generator(M-g), SiamRPN++ and a Depth-refiner(D-r). There are some details of reading and implementing it.

Contents


Paper & Code & note


Paper: TSDM: Tracking by SiamRPN++ with a Depth-refiner and a Mask-generator(arXiv 2020 paper)
Code: PyTorch
Note: Mendeley

Paper


Abstract

TSDM_Abstract.png

  1. Depth information provides informative cues for foreground-background separation and target bounding box regression.
  2. Few trackers have used depth information to play the important role aforementioned due to the lack of a suitable model.
  3. In this paper, a RGB-D tracker named TSDM is proposed, The M-g generates the background masks, and updates them as the target 3D position changes. The D-r optimizes the target bounding box estimated by SiamRPN++, based on the spatial depth distribution difference between the target and the surrounding background.
  4. It outperforms the state-of-the-art on the PTB and VOT.

Problem Description

TSDM_PD.png

  1. The main obstacle is that the tracker requires constant information (such as color), but the target depth distribution may change a lot when the target moves.

Problem Solution

TSDM_PS1.png
TSDM_PS2.png

  1. Depth mudules: M-g and D-r can overcome the obstacle above and make use of depth information effectively.
  2. Data augmentation: it helps retrain SiamRPN++ to work better with the M-g module.

Conceptual Understanding

TSDM_Pipeline.png

  1. Mask-generator: Input $X_d$ and $\overline{Dt_{i-1}}$ into M-g to get $M$ and $M_c$, then use $F_m(\cdot)$ to get $X_m$.
  2. SiamRPN++: Input $Z$ and $X_m$ into the core, then outputs the target bounding box $B_s$ ($W,H,C_x,C_y$).
  3. Depth-refiner: Cut out $R_c$ and $R_d$ from $X_c$ and $X_d$ by $B_s$ respectively. Then input $R_c$ and $R_d$ into D-r to get the refined target bounding box $B_d$ ($w,h,xr,yb$).

Core Conception

Mask-generator

  1. M-g generates two background mask images, $M$ is a 2-value image for clearing out the background of $X_c$, and $M_c$ is a color image for coloring the background of $X_c$.
  2. $M_c$ color selection: $M_c$ enhances the target background difference to make the target template matching easier.
  3. M-g stop-restart strategy: M-g should automatically stop to avoid masking the real target when a transient tracking drift happens.
  4. M-g simulated data augmentation: it used to generate enough training samples ($X_m$) to retrain the SiamRPN++.

SiamRPN++

  • It takes an image pair ($Z,X$) as input and outputs the target bounding box in the current frame, as: $f(Z,X)=\phi(Z)\ast\phi(X)$.
  • More details of SiamRPN++ can be found in previous blog [SiamRPN++][2].

Depth-refiner

  1. The bounding box estimated by the core contains the whole target, D-r improve the tracker performance just by cutting out no-target area.
  2. Information Fusion Network: It uses depth information to optmize the target state, and color information to overcomes the slight color-depth mismatch. The full architecture is as follows:

TSDM_Architecture.png

Experiments

TSDM_Result.png
TSDM_VOT.png
TSDM_PTB.png

Code


The complete code can be found in here with citing TSDM[3].
[Updating]

Note


How to use depth information on MOT tasks, detection or re-ID.

References


[1] ZHAO, Pengyao, et al. TSDM: Tracking by SiamRPN++ with a Depth-refiner and a Mask-generator. arXiv preprint arXiv:2005.04063, 2020.
[2] Gojay. “SiamRPN++.” https://gojay.top/2020/05/09/SiamRPN++/
[3] TSDM. https://github.com/lql-team/TSDM


  DLTrackingVOT

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×