FairMOT[1] is a one-shot tracker to fuse object detection and re-identification in a single network. The most contributions in this papar are
anchor-free
Re-ID feture extraction, multi-layerfeature aggregation
andlower-dimensional
re-ID fetures. There are some details of reading and implementing it.
Contents
Paper & Code & note
Paper: A Simple Baseline for Multi-Object Tracking(arXiv 2020 paper)
Code: Pytorch
Note: Mendeley
Paper
Abstract
- There has been remarkable progress on multi-object tracking with object detection and re-identification.
- Little attention has been focused on accomplishing the two tasks in a single network.
- In this work, they study the essential reasons behind the failure, and accordingly present a simple baseline to addresses the problem.
- It outperforms the state-of-the-art on the public datasets.
Problem Description
- Two steps: First the
detection
model localizes the bounding boxes of objects, then theassociation
model extracts Re-ID features and links it to tracks. However, those methods cannot perform inference at video rate because the two networks do not share features.- One-shot: Those methods
jointly
detect objects and learn Re-ID features. However, the accuracy and ID switches get worse a lot.
Problem Solution
- Anchor-Free: the anchor-based methods usually operate on a
coarse grid
. So there is a high chance that the features extracted at the anchor arenot aligned with the object center
.- Multi-Layer Feature Aggregation: it helps
reduce identity switches
by aggregating low-level and high-level features.- Lower-dimensional features: It helps reduce the risk of
over-fitting
to small data, and improves the trackingrobustness
.
Conceptual Understanding
- Multi-Layer Feature Aggregation: It follows Deep Layer Aggregation (
DLA
) to fuse features from multiple layers in order to deal with objects of different scales.- Anchor-free object detection: It estimates the
object centers
on high-resolution feature map.- pixel-wise Re-identification: It learn
low-dimensional
Re-ID features to reduce the computation time and improve the robustness.
Core Conception
Object Dection Branch
- Heatmap Head: This head is responsible for estimating the locations of the
object centers
.- Center Offset Head: This head is responsible for localizing the objects
more precisely
.- Box Size Head: This head is responsible for estimating the height and width of the target
bounding box
at each anchor location.
Identify Embedding Branch
- The goal of the identity embedding branch is to generate features that can
distinguish different objects
.- The resulting featuresis $E\in{R^{128\times{W}\times{H}}}$, the distance between different objects should be larger.
Loss Functions
- Heatmap Loss: The loss function is defined as pixel-wise logistic regression with
focal loss
.- Offset and Size Loss: They we enforce
l1 losses
for the two heads.- Identity Enbedding Loss: They treat object identity embedding as a classification task, then compute the
softmax loss
.
Online Tracking
Experiments
Code
The complete code can be found in here with citing FairMOT[2].
[Updating]
Note
- This method achieves the SOTA under the
private detector
on MOT Challenge, but it still exists in experiments.- It mostly improved detectional performance, when using it in actual enviroments, the
IDS
increase a lot than previous methods.- Considering how to improve the IDS is important in real world, maybe we can improve the association module based on
depth information
.
References
[1] Zhan Y, Wang C, Wang X, et al. A Simple Baseline for Multi-Object Tracking[J]. arXiv preprint arXiv:2004.01888, 2020.
[2] FairMOT. https://github.com/ifzhang/FairMOT