Opto-Electronic Engineering, Volume. 50, Issue 6, 230009(2023)

Multiple object tracking with aligned spatial-temporal feature

Wen Cheng1...2,3, Zhongbi Chen2,*, Qingqing Li2, Meihui Li2, Jianlin Zhang2 and Yuxing Wei2 |Show fewer author(s)
Author Affiliations
  • 1National Key Laboratory of Optical Field Manipulation Science and Technology, Chinese Academy of Sciences, Chengdu, Sichuan 610209 China
  • 2Institute of Optics and Electronics, Chinese Academy of Science, Chengdu, Sichuan 610209 China
  • 3University of Chinese Academy of Science School of Electronic, Electrical, Communication Engineering, Beijing 100049 China
  • show less
    References(35)

    [2] [2] Bewley A, Ge Z Y, Ott L, et al. Simple online and realtime tracking[C]//2016 IEEE International Conference on Image Processing (ICIP), 2016: 3464–3468. https://doi.org/10.1109/ICIP.2016.7533003.

    [3] [3] Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric[C]//2017 IEEE International Conference on Image Processing, 2018: 3645–3649. https://doi.org/10.1109/ICIP.2017.8296962.

    [5] [5] Berclaz J, Fleuret F, Fua P. Robust people tracking with global trajectory optimization[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), 2006: 744–750. https://doi.org/10.1109/CVPR.2006.258.

    [6] [6] Pirsiavash H, Ramanan D, Fowlkes C C. Globally-optimal greedy algorithms for tracking a variable number of objects[C]//CVPR 2011, 2011: 1201–1208. https://doi.org/10.1109/CVPR.2011.5995604.

    [7] [7] Brasó G, Leal-Taixé L. Learning a neural solver for multiple object tracking[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 6246–6256. https://doi.org/10.1109/CVPR42600.2020.00628.

    [8] [8] Xu J R, Cao Y, Zhang Z, et al. Spatial-temporal relation networks for multi-object tracking[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019: 3987–3997. https://doi.org/10.1109/ICCV.2019.00409.

    [9] [9] Wang Z D, Zheng L, Liu Y X, et al. Towards real-time multi-object tracking[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 107–122. https://doi.org/10.1007/978-3-030-58621-8_7.

    [11] [11] Bergmann P, Meinhardt T, Leal-Taixé L. Tracking without bells and whistles[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019: 941–951. https://doi.org/10.1109/ICCV.2019.00103.

    [12] [12] Zhou X Y, Koltun V, Krähenbühl P. Tracking objects as points[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 474–490. https://doi.org/10.1007/978-3-030-58548-8_28.

    [13] [13] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 213–229. https://doi.org/10.1007/978-3-030-58452-8_13.

    [14] [14] Sun P Z, Cao J K, Jiang Y, et al. Transtrack: Multiple object tracking with transformer[Z]. arXiv: 2012.15460, 2020. https://arxiv.org/abs/2012.15460.

    [15] [15] Meinhardt T, Kirillov A, Leal-Taixé L, et al. TrackFormer: Multi-object tracking with transformers[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 8834–8844. https://doi.org/10.1109/CVPR52688.2022.00864.

    [16] [16] Zeng F G, Dong B, Zhang Y A, et al. MOTR: End-to-end multiple-object tracking with transformer[C]//Proceedings of the 17th European Conference on Computer Vision, 2022: 659–675. https://doi.org/10.1007/978-3-031-19812-0_38.

    [17] [17] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000–6010.

    [18] [18] Ballas N, Yao L, Pal C, et al. Delving deeper into convolutional networks for learning video representations[C]//Proceedings of the 4th International Conference on Learning Representations, 2015.

    [19] [19] Yu F W, Li W B, Li Q Q, et al. POI: Multiple object tracking with high performance detection and appearance feature[C]//Proceedings of the European Conference on Computer Vision, 2016: 36–42. https://doi.org/10.1007/978-3-319-48881-3_3.

    [21] [21] Yu E, Li Z L, Han S D, et al. RelationTrack: Relation-aware multiple object tracking with decoupled representation[J]. IEEE Trans Multimedia, 2022. https://doi.org/10.1109/TMM.2022.3150169.

    [22] [22] Wang Q, Zheng Y, Pan P, et al. Multiple object tracking with correlation learning[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 3875–3885. https://doi.org/10.1109/CVPR46437.2021.00387.

    [23] [23] Tokmakov P, Li J, Burgard W, et al. Learning to track with object permanence[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 10840–10849. https://doi.org/10.1109/ICCV48922.2021.01068

    [24] [24] Welch G, Bishop G. An Introduction to the Kalman Filter[M]. Chapel Hill: University of North Carolina at Chapel Hill, 1995.

    [25] [25] Kuhn H W. The Hungarian method for the assignment problem[J]. Naval Res Logist Q, 1955, 2(1–2): 83–97.https://doi.org/10.1002/nav.3800020109.

    [26] [26] Peng J L, Wang C A, Wan F B, et al. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 145–161. https://doi.org/10.1007/978-3-030-58548-8_9.

    [27] [27] Zheng L, Bie Z, Sun Y F, et al. MARS: A video benchmark for large-scale person re-identification[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 868–884. https://doi.org/10.1007/978-3-319-46466-4_52.

    [28] [28] McLaughlin N, Del Rincon J M, Miller P. Recurrent convolutional network for video-based person re-identification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 1325–1334. https://doi.org/10.1109/CVPR.2016.148.

    [29] [29] Zhou Z, Huang Y, Wang W, et al. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6776–6785. https://doi.org/10.1109/CVPR.2017.717.

    [30] [30] Fu Y, Wang X Y, Wei Y C, et al. STA: Spatial-temporal attention for large-scale video-based person re-identification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019: 8287–8294. https://doi.org/10.1609/aaai.v33i01.33018287.

    [31] [31] Li J N, Zhang S L, Huang T J. Multi-scale 3D convolution network for video based person re-identification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019: 8618–8625. https://doi.org/10.1609/aaai.v33i01.33018618.

    [34] [34] Zhu X Z, Xiong Y W, Dai J F, et al. Deep feature flow for video recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4141–4150. https://doi.org/10.1109/CVPR.2017.441.

    [35] [35] Kang K, Ouyang W L, Li H S, et al. Object detection from video tubelets with convolutional neural networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 817–825. https://doi.org/10.1109/CVPR.2016.95.

    [36] [36] Feichtenhofer C, Pinz A, Zisserman A. Detect to track and track to detect[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 3057–3065. https://doi.org/10.1109/ICCV.2017.330.

    [37] [37] Xiao F Y, Lee Y J. Video object detection with an aligned spatial-temporal memory[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 494–510. https://doi.org/10.1007/978-3-030-01237-3_30.

    [38] [38] Yu F, Wang D Q, Shelhamer E, et al. Deep layer aggregation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 2403–2412. https://doi.org/10.1109/CVPR.2018.00255.

    [39] [39] Pang B, Li Y Z, Zhang Y F, et al. TubeTK: adopting tubes to track multi-object in a one-step training model[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 6307–6317. https://doi.org/10.1109/CVPR42600.2020.00634.

    [40] [40] Wu J J, Cao J L, Song L C, et al. Track to detect and segment: an online multi-object tracker[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 12347–12356. https://doi.org/10.1109/CVPR46437.2021.01217.

    [41] [41] Pang J M, Qiu L L, Li X, et al. Quasi-dense similarity learning for multiple object tracking[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 164–173. https://doi.org/10.1109/CVPR46437.2021.00023.

    Tools

    Get Citation

    Copy Citation Text

    Wen Cheng, Zhongbi Chen, Qingqing Li, Meihui Li, Jianlin Zhang, Yuxing Wei. Multiple object tracking with aligned spatial-temporal feature[J]. Opto-Electronic Engineering, 2023, 50(6): 230009

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Article

    Received: Jan. 12, 2023

    Accepted: Apr. 3, 2023

    Published Online: Aug. 9, 2023

    The Author Email:

    DOI:10.12086/oee.2023.230009

    Topics