Laser & Optoelectronics Progress, Volume. 62, Issue 6, 0637006(2025)

Multi-Head Mixed Self-Attention Mechanism for Object Detection

Qinghua Su*, Jianhong Mu, Wenhui Liang, Xiyu Wang, and Juntao Li
Author Affiliations
  • School of Information, Beijing Wuzi University, Beijing 101149, China
  • show less

    With the traditional attention mechanism, the representational ability and detection performance of the model are limited or its complexity and calculation cost are high. To solve these problems, an innovative lightweight multi-head mixed self-attention (MMSA) mechanism is proposed, aimed at enhancing the performance of object detection networks while maintaining the model's simplicity and efficiency. The MMSA module ingeniously integrates channel information with spatial information, as well as local and global information, by introducing a multi-head attention mechanism, further augmenting the network's representational capabilities. Compared to other attention mechanisms, MMSA achieves a superior balance between model representation, performance, and complexity. To validate the effectiveness of MMSA, it is integrated into the Backbone or Neck portions of the YOLOv8n network to enhance its multi-scale feature extraction and feature fusion capabilities. Extensive comparative experiments on the CityPersons, CrowdHuman, TT100K, BDD100K, and TinyPerson public datasets show that, compared with the original algorithm, YOLOv8n with MMSA improved their mean average precision (mAP@0.5) by 0.9 percentage points, 0.9 percentage points, 2.3 percentage points, 1.0 percentage points, and 1.7 percentage points, respectively, without significantly increasing the model size. Additionally, the detection speed reached 145 frame/s, fully meeting the requirements of real-time applications. Experimental results fully demonstrate the effectiveness of the MMSA mechanism in improving object detection outcomes, showcasing its practical value and broad applicability in real-world scenarios.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Qinghua Su, Jianhong Mu, Wenhui Liang, Xiyu Wang, Juntao Li. Multi-Head Mixed Self-Attention Mechanism for Object Detection[J]. Laser & Optoelectronics Progress, 2025, 62(6): 0637006

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Digital Image Processing

    Received: Jun. 19, 2024

    Accepted: Aug. 1, 2024

    Published Online: Mar. 6, 2025

    The Author Email:

    DOI:10.3788/LOP241509

    CSTR:32186.14.LOP241509

    Topics