Optics and Precision Engineering, Volume. 30, Issue 24, 3198(2022)

Chained semantic generation network for video captioning

Lin MAO... Hang GAO*, Dawei YANG and Rubo ZHANG |Show fewer author(s)
Author Affiliations
  • School of Electromechanical Engineering, Dalian Minzu University, Dalian116600, China
  • show less

    Aiming to address the unsatisfactory expression ability of semantics, which results in inaccurate text descriptions in video captioning, a chained semantic generation network (ChainS-Net) for video captioning is proposed. A multistage two-branch crossing chained feature extraction structure is constructed that uses global and local domain modules as basic units and captures the video semantics from global and local visual features, respectively. At each stage of the network, semantic information is transformed and parsed between the global and local domains. This method allows visual and semantic information to be cross referenced and improves the semantic expression ability. Furthermore, it allows a more effective semantic representation to be obtained through multistage iterative processing, thereby improving video captioning. Experimental results on MSR-VTT and MSVD datasets show that the proposed ChainS-Net outperforms other similar algorithms. Compared with the semantics-assisted video captioning network, SAVC, ChainS-Net shows average improvements of 2.5% in four metrics of video captioning.

    Tools

    Get Citation

    Copy Citation Text

    Lin MAO, Hang GAO, Dawei YANG, Rubo ZHANG. Chained semantic generation network for video captioning[J]. Optics and Precision Engineering, 2022, 30(24): 3198

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Information Sciences

    Received: May. 16, 2022

    Accepted: --

    Published Online: Feb. 15, 2023

    The Author Email: GAO Hang (gao_hang2021@163.com)

    DOI:10.37188/OPE.20223024.3198

    Topics