FairMOT[1] is a one-shot tracker to fuse object detection and re-identification in a single network. The most contributions in this papar are
anchor-freeRe-ID feture extraction, multi-layerfeature aggregationandlower-dimensionalre-ID fetures. There are some details of reading and implementing it.
Contents
Paper & Code & note
Paper: A Simple Baseline for Multi-Object Tracking(arXiv 2020 paper)
Code: Pytorch
Note: Mendeley
Paper
Abstract

- There has been remarkable progress on multi-object tracking with object detection and re-identification.
- Little attention has been focused on accomplishing the two tasks in a single network.
- In this work, they study the essential reasons behind the failure, and accordingly present a simple baseline to addresses the problem.
- It outperforms the state-of-the-art on the public datasets.
Problem Description
- Two steps: First the
detectionmodel localizes the bounding boxes of objects, then theassociationmodel extracts Re-ID features and links it to tracks. However, those methods cannot perform inference at video rate because the two networks do not share features.- One-shot: Those methods
jointlydetect objects and learn Re-ID features. However, the accuracy and ID switches get worse a lot.
Problem Solution

- Anchor-Free: the anchor-based methods usually operate on a
coarse grid. So there is a high chance that the features extracted at the anchor arenot aligned with the object center.- Multi-Layer Feature Aggregation: it helps
reduce identity switchesby aggregating low-level and high-level features.- Lower-dimensional features: It helps reduce the risk of
over-fittingto small data, and improves the trackingrobustness.
Conceptual Understanding

- Multi-Layer Feature Aggregation: It follows Deep Layer Aggregation (
DLA) to fuse features from multiple layers in order to deal with objects of different scales.- Anchor-free object detection: It estimates the
object centerson high-resolution feature map.- pixel-wise Re-identification: It learn
low-dimensionalRe-ID features to reduce the computation time and improve the robustness.
Core Conception
Object Dection Branch
- Heatmap Head: This head is responsible for estimating the locations of the
object centers.- Center Offset Head: This head is responsible for localizing the objects
more precisely.- Box Size Head: This head is responsible for estimating the height and width of the target
bounding boxat each anchor location.
Identify Embedding Branch
- The goal of the identity embedding branch is to generate features that can
distinguish different objects.- The resulting featuresis $E\in{R^{128\times{W}\times{H}}}$, the distance between different objects should be larger.
Loss Functions
- Heatmap Loss: The loss function is defined as pixel-wise logistic regression with
focal loss.- Offset and Size Loss: They we enforce
l1 lossesfor the two heads.- Identity Enbedding Loss: They treat object identity embedding as a classification task, then compute the
softmax loss.
Online Tracking

Experiments


Code
The complete code can be found in here with citing FairMOT[2].
[Updating]
Note
- This method achieves the SOTA under the
private detectoron MOT Challenge, but it still exists in experiments.- It mostly improved detectional performance, when using it in actual enviroments, the
IDSincrease a lot than previous methods.- Considering how to improve the IDS is important in real world, maybe we can improve the association module based on
depth information.
References
[1] Zhan Y, Wang C, Wang X, et al. A Simple Baseline for Multi-Object Tracking[J]. arXiv preprint arXiv:2004.01888, 2020.
[2] FairMOT. https://github.com/ifzhang/FairMOT