Opto-Electronic Engineering, Volume. 52, Issue 1, 240234(2025)

Lightweight Swin Transformer combined with multi-scale feature fusion for face expression recognition

Yanqiu Li1...2, Shengzhao Li1, Guangling Sun1,2,*, and Pu Yan12 |Show fewer author(s)
Author Affiliations
  • 1School of Electronic and Information Engineering, Anhui Jianzhu University, Hefei, Anhui 260601, China
  • 2Anhui International Joint Research Center for Ancient Architecture Intellisencing and Multi-Dimensional Modeling, Hefei, Anhui 230601, China
  • show less
    Figures & Tables(20)
    Swin Transformer network structure diagram
    Swin Transformer block module structure diagram
    Self-attention computing area. (a) MSA; (b) W-MSA; (c) SW-MSA
    Improved model structure diagram
    SPST module structure diagram
    A visual view of the BN, LN, and BCN standardization technology
    EMA module structure diagram
    Activation maps of the model before and after adding EMA module
    A partial sample of datasets
    Confusion matrix validation results on JAFFE. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    Confusion matrix validation results on RAF-DB. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    Confusion matrix validation results on FERPLUS. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    Confusion matrix validation results on FANE. (a) Original Swin Transformer model; (b) Improved Swin Transformer model
    • Table 1. Comparison of parameters before and after the model is improved

      View table
      View in Article

      Table 1. Comparison of parameters before and after the model is improved

      ModelEMA moduleSPST moduleParameters
      Original Swin Transformer××27,524,737
      Improved Swin Transformer×27,526,225
      Improved Swin Transformer×23,185,251
      Improved Swin Transformer23,186,739
    • Table 2. Experimental comparison of replacing SPST modules in different stages

      View table
      View in Article

      Table 2. Experimental comparison of replacing SPST modules in different stages

      PositionSwin Transformer blockSPST blockParametersRACC/%GFLOPs/GFPS
      Stage134,331,98172.3319.0686
      Stage229,625,42875.2712.44152
      Stage324,190,41382.175.84281
      Stage423,185,25186.864.12335
      Stage427,524,73785.694.51301
    • Table 3. Entropy comparison of activation maps

      View table
      View in Article

      Table 3. Entropy comparison of activation maps

      ModelAngerDisgustFearHappySadSurprise
      Original Swin Transformer10.597410.532510.428210.615010.598010.6626
      Improved Swin Transformer8.24379.41909.22048.11028.99068.9113
    • Table 4. Configuration of the experimental environment

      View table
      View in Article

      Table 4. Configuration of the experimental environment

      Configuration nameEnvironmental parameter
      CPUInter (R) Core (TM) i5-12400F 2.50 GHz
      GPUNVIDIA GeForce RTX 3060 (12 GB)
      Memory16 G
      Python3.9.19
      CUDA11.8
      Torch2.0.0
    • Table 5. Accuracy of embedding the EMA module behind different stages

      View table
      View in Article

      Table 5. Accuracy of embedding the EMA module behind different stages

      PositionRACC/%Parameters
      JAFFEFERPLUSRAF-DBFANE
      After stage195.5785.5386.8068.8423,185,635
      After stage297.5686.4687.2970.1123,186,739
      After stage396.8085.5686.9968.6023,191,107
      After stage495.8785.7686.6769.3723,187,875
    • Table 6. Results of ablation experiments on FERPLUS, RAF-DB, and FANE

      View table
      View in Article

      Table 6. Results of ablation experiments on FERPLUS, RAF-DB, and FANE

      SPST moduleEMA moduleRACC/%ParametersGFLOPs/GFPS
      FERPLUSRAF-DBFANE
      ××85.4385.6968.4727,524,7374.51301
      ×85.7386.9969.6727,526,2254.52297
      ×85.8786.8669.7223,185,2514.12335
      86.4687.2970.1123,186,7394.13330
    • Table 7. Accuracy comparsion of different networks on JAFFE,FERPLUS, and RAF-DB

      View table
      View in Article

      Table 7. Accuracy comparsion of different networks on JAFFE,FERPLUS, and RAF-DB

      ModelACC/%
      JAFFEFERPLUSRAF-DB
      ARBEx[9]96.67————
      LBP+HOG[7]96.05————
      SCN[4]86.3385.9787.03
      RAN[8]88.6783.6386.90
      EfficientNetB0[25]——85.0184.21
      MobileNetV2[26]——84.0383.54
      MobileNetV3[27]——84.9784.88
      Ad-Corre[28]————86.96
      POSTER[19]————86.03
      R3HO-Net[29]————85.52
      Ada-CM[30]————84.13
      Swin Transformer (base)95.1285.4385.69
      Ours97.5686.4687.29
    Tools

    Get Citation

    Copy Citation Text

    Yanqiu Li, Shengzhao Li, Guangling Sun, Pu Yan. Lightweight Swin Transformer combined with multi-scale feature fusion for face expression recognition[J]. Opto-Electronic Engineering, 2025, 52(1): 240234

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Article

    Received: Oct. 7, 2024

    Accepted: Dec. 3, 2024

    Published Online: Feb. 21, 2025

    The Author Email: Sun Guangling (孙光灵)

    DOI:10.12086/oee.2025.240234

    Topics