Acta Optica Sinica, Volume. 43, Issue 15, 1510003(2023)

Research Progress in Fundamental Architecture of Deep Learning-Based Single Object Tracking Method

Tingfa Xu1,2、*, Ying Wang1, Guokai Shi3, Tianhao Li1, and Jianan Li1、**
Author Affiliations
  • 1Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
  • 2Chongqing Innovation Center, Beijing Institute of Technology, Chongqing 401120, China
  • 3North Automatic Control Technology Institute, Taiyuan 030006, Shanxi, China
  • show less
    Figures & Tables(9)
    Two basic architectures of single object tracking methods. (a) Siamese network-based two-stream tracking method; (b) transformer-based one-stream tracking method
    Landmark advances in deep learning-based single object tracking methods
    Feature fusion methods in single object tracking methods. (a) Naïve cross correlation-based fusion; (b) depth-wise cross convolution-based fusion; (c) pixel-wise correlation-based fusion; (d) cross-attention-based fusion; (e) concatenation-based fusion
    Tracking head in single object tracking methods. (a) Anchor-based head; (b) anchor-free head; (c) corner-based head
    • Table 1. Structural composition of representative deep learning-based single object tracking methods

      View table

      Table 1. Structural composition of representative deep learning-based single object tracking methods

      TrackerPublicationFeature extractionFeature fusionTracking head
      SiamFC2ECCVW 2016AlexNetNaïve cross correlation
      SiamRPN39CVPR 2018AlexNetUp-channel cross correlationAnchor-based
      SiamRPN++34CVPR 2019ResNetDepth-wise cross convolutionAnchor-based
      SiamFC++7AAAI 2020InceptionV3Depth-wise cross convolutionAnchor-free
      SiamCAR51CVPR 2020ResNetDepth-wise cross convolutionAnchor-free
      PCDHV41ACCV 2021InceptionV3Pixel-wise cross convolutionCorner-based
      TransT12CVPR 2021ResNet50Cross-attention-based fusionAnchor-free
      STARK44ICCV 2021ResNetConcatenation-based fusionCorner-based
      SwinTrack28NeurIPS 2022Swin transformerConcatenation-based fusionAnchor-free
      MixFormer53CVPR 2022One-stage feature extraction and fusionCorner-based
      SimTrack54ECCV 2022One-stage feature extraction and fusionCorner-based
      OSTrack14ECCV 2022One-stage feature extraction and fusionAnchor-free
    • Table 2. Common single object tracking datasets

      View table

      Table 2. Common single object tracking datasets

      DatasetSequenceAttributeAverage duration /sMinimum frameMaximum frame
      VOT-2018156012355.90411500
      UAV123611231230.481093085
      GOT-10k16

      Total:9935

      Training:9335

      Validating:180

      Testing:420

      615.0051920
      LaSOT17

      Total:1400

      Training:1120

      Testing:280

      1483.57100011397
      TrackingNet18

      Total:30643

      Training:30132

      Testing:511

      1516.70962368
      TNL2K62

      Total:2000

      Training:1300

      Testing:700

      1720.742118488
    • Table 3. Video properties of common single object tracking datasets

      View table

      Table 3. Video properties of common single object tracking datasets

      No.AttrDescriptionGOT-10kUAV123LaSOTTrackingNetTNL2K
      1POCPartial occlusion
      2FOCFull occlusion
      3SVScale variation
      4ARCAspect ratio change
      5FMFast motion
      6IVIllumination variation
      7LRLow resolution
      8OVOut-of-view
      9CMCamera motion
      10BCBackground clutter
      11VCViewpoint change
      12SOBSimilar object
      13DEFDeformation
      14MBMotion blur
      15IPRIn-plane rotation
      16OPROut-of-plane rotation
      17ASInfluence of adversarial samples
      18TCTwo targets with similar intensity cross each other
      19MSVideo contain both color and thermal images
    • Table 4. Evaluation metrics of single object tracking methods

      View table

      Table 4. Evaluation metrics of single object tracking methods

      CategoryPrincipleEvaluation metricApplicable dataset
      SIntersection over union(IoU)between tracking results and groundtruthsAO(average overlap)GOT-10k
      SR(success rate)GOT-10k
      Success plotLaSOT
      AUC(area under the curve)LaSOT,TrackingNet
      PPixel distance between centers of tracking results and groundtruthsPrecision plotLaSOT,TrackingNet
      PrecisionLaSOT,TrackingNet
      Normalized precisionLaSOT,TrackingNet
    • Table 5. Performance comparison of single object tracking methods on GOT-10K, LaSOT, and TrackingNet datasets

      View table

      Table 5. Performance comparison of single object tracking methods on GOT-10K, LaSOT, and TrackingNet datasets

      TypeTrackerPublicationGOT-10k16LaSOT17TrackingNet18
      AOSR50SR75AUCPnormPAUCPnormP
      Two-streamSiamFC2ECCVW 201639.242.613.557.165.453.3
      SiamRPN29CVPR 201848.158.127.0
      SiamRPN++34CVPR 201951.761.632.549.656.949.173.380.069.4
      SiamBAN49CVPR 202051.459.852.1
      CGACD52CVPR 202051.862.671.180.069.3
      SiamCAR51CVPR 202056.967.041.550.760.051.0
      SiamAttn48CVPR 202056.064.875.281.7
      SiamFC++7AAAI 202059.569.547.954.462.354.775.480.070.5
      Ocean63ECCV 202061.172.147.356.065.156.6
      TransT12CVPR 202167.776.860.964.973.869.081.486.780.3
      STMTrack42CVPR 202164.273.757.560.669.363.380.385.176.7
      AutoMatch46ICCV 202165.276.654.358.259.976.072.6
      STARK44ICCV 202168.878.164.167.177.082.086.9
      SparseTT37IJCAI 202269.379.163.866.074.870.181.786.679.5
      CsWinTT45CVPR 202269.478.965.466.275.270.981.986.779.5
      SwinTrack28NeurIPS 202272.480.567.871.376.584.082.8
      One-streamSBT13CVPR 202270.480.864.766.771.1
      MixFormer53CVPR 202270.780.067.870.179.976.383.988.983.1
      OsTrack14ECCV 202273.783.270.871.181.177.683.988.583.2
      SimTrack54ECCV 202269.878.866.070.579.783.487.4
      CTTrack57AAAI 202372.881.371.569.879.776.284.989.183.5
    Tools

    Get Citation

    Copy Citation Text

    Tingfa Xu, Ying Wang, Guokai Shi, Tianhao Li, Jianan Li. Research Progress in Fundamental Architecture of Deep Learning-Based Single Object Tracking Method[J]. Acta Optica Sinica, 2023, 43(15): 1510003

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Image Processing

    Received: Mar. 29, 2023

    Accepted: Jun. 15, 2023

    Published Online: Aug. 15, 2023

    The Author Email: Xu Tingfa (ciom_xtf1@bit.edu.cn), Li Jianan (lijianan@bit.edu.cn)

    DOI:10.3788/AOS230746

    Topics