Optics and Precision Engineering, Volume. 30, Issue 24, 3198(2022)

Chained semantic generation network for video captioning

Lin MAO... Hang GAO*, Dawei YANG and Rubo ZHANG |Show fewer author(s)
Author Affiliations
  • School of Electromechanical Engineering, Dalian Minzu University, Dalian116600, China
  • show less
    Figures & Tables(15)
    Structure diagram of chained network
    Diagram of visual feature segmentation and overall network structure
    Network structure of MLP
    Global domain module
    Local domain module
    Combination of global and local domains
    Chained semantic generation network
    Video captioning network
    Structures of ablation experiment for two modules
    Structures of ablation experiment for cross-extraction
    Comparison of video captioning results
    • Table 1. Performance comparison of ablation experiments

      View table
      View in Article

      Table 1. Performance comparison of ablation experiments

      MSR-VTTMSVD
      ModeB-4CMRB-4CMR
      G45.8353.1629.2863.6462.35109.7139.0477.04
      L45.2152.6329.3863.5961.07113.4139.7177.34
      G-L46.7052.9329.5564.0365.20119.5841.5179.02
      交叉47.4452.9829.7564.1166.29119.9342.2579.30
      非交叉46.7252.6729.6063.9965.47120.0941.4979.24
      k=146.7052.9329.5564.0365.20119.5841.5179.02
      k=247.4452.9829.7564.1166.29119.9342.2579.30
      k=347.7053.6429.7864.3165.62122.4442.2579.93
      k=448.2053.7529.9864.4866.51122.2641.9880.05
      k=548.3554.2229.9064.4764.45121.0942.1379.85
    • Table 2. Performance comparison on MSR-VTT dataset

      View table
      View in Article

      Table 2. Performance comparison on MSR-VTT dataset

      MethodB-4CMR
      MTVC2040.847.128.860.2
      CIDEnt-RL2140.551.728.461.4
      SibNet2240.947.527.560.2
      HACA2343.449.729.561.8
      TAMoE2442.248.929.462.0
      POS2541.353.428.762.1
      MARN1040.447.128.160.7
      JSRL-VCT2642.349.129.762.8
      GRU-EVE2738.348.128.460.7
      STG-KD2840.547.128.360.9
      SAAT2939.951.027.761.2
      ORG-TRL943.650.928.862.1
      SAVC845.853.229.363.6
      ChainS-Net48.253.830.064.5
    • Table 3. Performance comparison on MSVD dataset

      View table
      View in Article

      Table 3. Performance comparison on MSVD dataset

      MethodB-4CMR
      LSTM-E3045.331.0
      h-RNN3149.965.832.6
      aLSTMs3250.874.833.3
      SCN1451.177.733.5
      MTVC2054.592.436.072.8
      ECO1853.585.835.0
      SibNet2254.288.234.871.7
      POS25]53.991.034.972.1
      MARN1048.692.235.171.9
      JSRL-VCT2652.887.836.171.8
      GRU-EVE2747.978.135.071.5
      STG-KD2852.293.036.973.9
      SAAT2946.581.033.569.4
      ORG-TRL954.395.236.473.9
      SAVC862.4109.739.077.0
      ChainS-Net65.6122.442.379.9
    Tools

    Get Citation

    Copy Citation Text

    Lin MAO, Hang GAO, Dawei YANG, Rubo ZHANG. Chained semantic generation network for video captioning[J]. Optics and Precision Engineering, 2022, 30(24): 3198

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Information Sciences

    Received: May. 16, 2022

    Accepted: --

    Published Online: Feb. 15, 2023

    The Author Email: GAO Hang (gao_hang2021@163.com)

    DOI:10.37188/OPE.20223024.3198

    Topics