Abstract
In this paper, a unified three-layer hierarchical approach for solving tracking problem in a multiple non-overlapping cameras setting is proposed. Given a video and a set of detections (obtained by any person detector), we first solve within-camera tracking employing the first two layers of our framework and then, in the third layer, we solve across-camera tracking by associating tracks of the same person in all cameras simultaneously. To best serve our purpose, we propose fast-constrained dominant set clustering (FCDSC), a novel method which is several orders of magnitude faster (close to real time) than existing methods. FCDSC is a parameterized family of quadratic programs that generalizes the standard quadratic optimization problem. In our method, we first build a graph where nodes of the graph represent short-tracklets, tracklets and tracks in the first, second and third layer of the framework, respectively. The edge weights reflect the similarity between nodes. FCDSC takes as input a constrained set, a subset of nodes from the graph which need to be included in the extracted cluster. Given a constrained set, FCDSC generates compact clusters by selecting nodes from the graph which are highly similar to each other and with elements in the constrained set. We have tested this approach on a very large and challenging dataset (namely, MOTchallenge DukeMTMC) and show that the proposed framework outperforms the state-of-the-art approaches. Even though the main focus of this paper is on multi-target tracking in non-overlapping cameras, the proposed approach can also be applied to solve video-based person re-identification problem. We show that when the re-identification problem is formulated as a clustering problem, FCDSC can be used in conjunction with state-of-the-art video-based re-identification algorithms, to increase their already good performances. Our experiments demonstrate the general applicability of the proposed framework for multi-target multi-camera tracking and person re-identification tasks.
Similar content being viewed by others
Notes
https://motchallenge.net/results/DukeMTMCT/ (standing 01/13/2018).
References
Brendel, W., Amer, M., & Todorovic, S. (2011). Multiobject tracking as maximum weight independent set. In Computer vision and pattern recognition (CVPR), 2011 IEEE conference on (pp. 1273–1280). IEEE.
Cai, Y., & Medioni, G. G. (2014). Exploring context information for inter-camera multiple target tracking. In IEEE workshop on applications of computer vision (WACV) (pp. 761–768).
Chen, X., Huang, K., & Tan, T. (2014). Object tracking across non-overlapping views by learning inter-camera transfer models. Pattern Recognition, 47(3), 1126–1137.
Cheng, D., Gong, Y., Wang, J., Hou, Q., & Zheng, N. (2017). Part-aware trajectories association across non-overlapping uncalibrated cameras. Neurocomputing, 230, 30–39.
Cong, D. N. T., Achard, C., Khoudour, L., & Douadi, L. (2009). Video sequences association for people re-identification across multiple non-overlapping cameras. In IAPR international conference on image analysis and processing (ICIAP) (pp. 179–189).
Dehghan, A., Assari, S. M., & Shah, M. (2015). GMMCP tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4091–4099).
D’Orazio, T., Mazzeo, P. L., & Spagnolo, P. (2009). Color brightness transfer function evaluation for non overlapping multi camera tracking. In ACM/IEEE international conference on distributed smart cameras (ICDSC) (pp. 1–6).
Farenzena, M., Bazzani, L., Perina, A., Murino, V., & Cristani, M. (2010). Person re-identification by symmetry-driven accumulation of local features. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2360–2367).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 32(9), 1627–1645.
Gao, Y., Ji, R., Zhang, L., & Hauptmann, A. G. (2014). Symbiotic tracker ensemble toward A unified tracking framework. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 24(7), 1122–1131.
Gilbert, A., & Bowden, R. (2006). Tracking objects across cameras by incrementally learning inter-camera colour calibration and patterns of activity. In European conference on computer vision (ECCV) (pp. 125–136).
Grossman, R., Bayardo, R. J., & Bennett, K. P. (Eds.). (2005). ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM.
Hamid Rezatofighi, S., Milan, A., Zhang, Z., Shi, Q., Dick, A., & Reid, I. (2015). Joint probabilistic data association revisited. In Proceedings of the IEEE International conference on computer vision (pp. 3047–3055).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778).
Javed, O., Shafique, K., Rasheed, Z., & Shah, M. (2008). Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views. Computer Vision and Image Understanding, 109(2), 146–162.
Kläser, A., Marszalek, M., & Schmid, C. (2008). A spatio–temporal descriptor based on 3d-gradients. In British machine vision conference (BMVC) (pp. 1–10).
Köstinger, M., Hirzer, M., Wohlhart, P., Roth, P. M., & Bischof, H. (2012). Large scale metric learning from equivalence constraints. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2288–2295).
Kuo, C., Huang, C., & Nevatia, R. (2010). Inter-camera association of multi-target tracks by on-line learned appearance affinity models. In European conference on computer vision (ECCV) (pp. 383–396).
Leibe, B., Schindler, K., & Van Gool, L. (2007). Coupled detection and trajectory estimation for multi-object tracking. In Computer vision, 2007. ICCV 2007. IEEE 11th international conference on (pp. 1–8). IEEE.
Li, X., Hu, W., Shen, C., Zhang, Z., Dick, A., & Hengel, A. V. D. (2013). A survey of appearance models in visual object tracking. ACM Transactions on Intelligent Systems and Technology (TIST), 4(4), 58.
Liao, S., Hu, Y., Zhu, X., & Li, S. Z. (2015). Person re-identification by local maximal occurrence representation and metric learning. In IEEE conference on computer vision and pattern recognition (CVPR) 2015, Boston, MA, USA (pp. 2197–2206).
Liu, H., Latecki, L. J., & Yan, S. (2013). Fast detection of dense subgraphs with iterative shrinking and expansion. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(9), 2131–2142.
Ma, B., Su, Y., & Jurie, F. (2014). Covariance descriptor based on bio-inspired features for person re-identification and face verification. Image Vision Computing, 32(6–7), 379–390.
Maksai, A., Wang, X., Fleuret, F., & Fua, P. (2017). Non-Markovian globally consistent multi-object tracking. In 2017 IEEE international conference on computer vision (ICCV) (pp. 2563–2573). IEEE.
Maksai, A., Wang, X., & Fua, P. (2016). What players do with the ball: A physically constrained interaction modeling. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 972–981).
McLaughlin, N., del Rincón, J. M., & Miller, P. C. (2016). Recurrent convolutional network for video-based person re-identification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1325–1334).
Pavan, M., & Pelillo, M. (2004). Efficient out-of-sample extension of dominant-set clusters. In Annual conference on neural information processing systems (NIPS) (pp. 1057–1064)
Pavan, M., & Pelillo, M. (2007). Dominant sets and pairwise clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 29(1), 167–172.
Prosser, B. J., Gong, S., & Xiang, T. (2008). Multi-camera matching using bi-directional cumulative brightness transfer functions. In British machine vision conference (BMVC) (pp. 1–10).
Ristani, E., Solera, F., Zou, R. S., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In European conference on computer vision (ECCV) (pp. 17–35).
Ristani, E., & Tomasi, C. (2014). Tracking multiple people online and in real time. In Asian conference on computer vision (pp. 444–459). Berlin: Springer.
Ristani, E., & Tomasi, C. (2018). Features for multi-target multi-camera tracking and re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6036–6046).
Rota Bulò, S., Pelillo, M., & Bomze, I. M. (2011). Graph-based quadratic optimization: A fast evolutionary approach. Computer Vision and Image Understanding, 115(7), 984–995.
Smeulders, A. W., Chu, D. M., Cucchiara, R., Calderara, S., Dehghan, A., & Shah, M. (2013). Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1, 1.
Smith, J. M. (1988). Evolution and the theory of games. In Did Darwin get it right? (pp. 202–215). Berlin: Springer.
Solera, F., Calderara, S., Ristani, E., Tomasi, C., & Cucchiara, R. (2016). Tracking social groups within and across cameras. IEEE Transactions on Circuits and Systems for Video Technology, 27(3), 441–453.
Solow, D. (2007). Linear and nonlinear programming. In Wiley Encyclopedia of Computer Science and Engineering. Wiley Online Library.
Srivastava, S., Ng, K. K., & Delp, E. J. (2011). Color correction for object tracking across multiple cameras. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1821–1824).
Wang, B., Wang, G., Luk Chan, K., & Wang, L. (2014). Tracklet association with online target-specific metric learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1234–1241).
Wang, T., Gong, S., Zhu, X., & Wang, S. (2014). Person re-identification by video ranking. In European conference on computer vision (ECCV) (pp. 688–703).
Xiong, F., Gou, M., Camps, O. I., & Sznaier, M. (2014). Person re-identification using kernel-based metric learning methods. In European conference on computer vision (ECCV) (pp. 1–16).
Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys (CSUR), 38(4), 13.
Yoon, K., Song, Y., & Jeon, M. (2018). Multiple hypothesis tracking algorithm for multi-target multi-camera tracking with disjoint views. IET Image Processing, 12(7), 1175–1184.
You, J., Wu, A., Li, X., & Zheng, W. (2016). Top-push video-based person re-identification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1345–1353).
Yu, S. I., Meng, D., Zuo, W., & Hauptmann, A. (2016). The solution path algorithm for identity-aware multi-object tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3871–3879).
Zamir, A. R., Dehghan, A., & Shah, M. (2012). GMCP-tracker: Global multi-object tracking using generalized minimum clique graphs. In European conference on computer vision (ECCV) (pp. 343–356).
Zemene, E., & Pelillo, M. (2015). Path-based dominant-set clustering. In ICIAP (pp. 150–160).
Zemene, E., & Pelillo, M. (2016). Interactive image segmentation using constrained dominant sets. In European conference on computer vision (ECCV) (pp. 278–294).
Zhang, S., Zhu, Y., & Roy-Chowdhury, A. K. (2015). Tracking multiple interacting targets in a camera network. Computer Vision and Image Understanding, 134, 64–73.
Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., & Tian, Q. (2016). MARS: A video benchmark for large-scale person re-identification. In European conference on computer vision (ECCV) (pp. 868–884).
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In International conference on computer vision (ICCV) (pp. 1116–1124).
Acknowledgements
This research is based upon work supported in parts by the U.S. Army Research Laboratory and the U.S. Army Research Office (ARO) under Contract/Grant No. W911NF-14-1-0294; and the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA R&D Contract No. D17PC00345. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Bernt Schiele.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Yonatan Tariku Tesfaye and Eyasu Zemene have contributed equally to this work.
Rights and permissions
About this article
Cite this article
Tesfaye, Y.T., Zemene, E., Prati, A. et al. Multi-target Tracking in Multiple Non-overlapping Cameras Using Fast-Constrained Dominant Sets. Int J Comput Vis 127, 1303–1320 (2019). https://doi.org/10.1007/s11263-019-01180-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-019-01180-6