Laser & Optoelectronics Progress, Volume. 61, Issue 14, 1415004(2024)

Multiview Stereo Reconstruction with Feature Aggregation Transformer

Min Wang1,2, Mingfu Zhao1,2, Tao Song1,2、*, Weiwei Li1,2, Yuan Tian1,2, Cheng Li2, and Yu Zhang2
Author Affiliations
  • 1Optical Fiber Sensing and Photoelectric Detection Chongqing Key Laboratory, Chongqing 400054, China
  • 2College Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China
  • show less
    Figures & Tables(11)
    FAT-MVSNet architecture
    Comparison of convolutional kernel sampling. (a) The convolutional kernel sampling method of standard convolution; (b) (c) (d) convolutional kernel sampling method for deformable convolutions
    Regular grid R
    Implementation process of deformable convolution
    Transformer based feature aggregation module
    Transformer encoder and attention structure. (a) Transformer encoder layer; (b) attention layer
    Comparison of reconstruction results of some models on DTU
    Comparison of reconstruction results of some models on the Tanks & Temples
    • Table 1. Quantitative testing results of different methods on DTU

      View table

      Table 1. Quantitative testing results of different methods on DTU

      MethodDaccDcompDoverall
      Furu0.6130.9410.777
      Gipuma0.2830.8730.578
      COLMAP0.4000.6640.532
      MVSNet0.3960.5270.462
      Fast-MVSNet0.3360.4030.370
      CasMVSNet0.3250.3850.355
      UCS-Net0.3380.3490.344
      Uni-MVSNet0.3520.2780.315
      CVP-MVSNet0.2960.4060.351
      CDS-MVSNet0.3520.2800.316
      MVSTR0.3560.2950.326
      Proposed method0.3350.2840.310
    • Table 2. Quantitative testing results of different methods on DTU

      View table

      Table 2. Quantitative testing results of different methods on DTU

      MethodInt.MeanIntermediateAdv.MeanAdvanced
      Fam.Fra.Hor.Lig.M60Pan.Pla.Tra.Aud.Bal.Cou.Mus.Pal.Tem.
      COLMAP42.1450.4122.2526.6356.4344.8346.9748.5342.0427.2416.0225.2334.741.5118.0527.94
      MVSNet43.4855.9928.5525.0750.7953.9650.8647.934.69
      Fast-MVSNet47.3965.1839.5934.9847.8149.1646.253.2742.91
      PatchmatchNet53.1566.9952.6443.2454.8752.8749.5454.2150.8132.3123.6937.7330.0441.828.3132.29
      CasMVSNet56.8476.3758.4546.2655.8156.1154.0658.1849.5131.1219.8138.4629.143.8727.3628.11
      UCS-Net54.8376.0953.1643.035455.651.4957.3847.89
      CVP-MVSNet54.0376.547.7436.3455.1257.2854.2857.4347.54
      AA-RMVSNet61.5177.7759.5351.5364.0264.0559.4760.8554.933.5320.9640.1532.0546.0129.2832.71
      CDS-MVSNet61.5878.8563.1753.0461.3462.6359.0662.2852.3
      MVSTER37.5326.6842.1435.6549.3732.1639.19
      Proposed method62.8178.3265.2153.0162.0764.4863.2561.7854.3638.1827.3643.3739.2652.1933.4633.41
    • Table 3. Comparison of quantitative results of ablation experiments

      View table

      Table 3. Comparison of quantitative results of ablation experiments

      No.MethodDaccDcompDoverall
      1Baseline0.3670.3250.346
      2+DCN0.3560.3040.330
      3+SA+CA0.3420.2880.315
      4FAT-MVSNet0.3350.2840.310
    Tools

    Get Citation

    Copy Citation Text

    Min Wang, Mingfu Zhao, Tao Song, Weiwei Li, Yuan Tian, Cheng Li, Yu Zhang. Multiview Stereo Reconstruction with Feature Aggregation Transformer[J]. Laser & Optoelectronics Progress, 2024, 61(14): 1415004

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Machine Vision

    Received: Nov. 22, 2023

    Accepted: Dec. 21, 2023

    Published Online: Jul. 8, 2024

    The Author Email: Tao Song (tsong@cqut.edu.cn)

    DOI:10.3788/LOP232546

    CSTR:32186.14.LOP232546

    Topics