Chinese Journal of Liquid Crystals and Displays, Volume. 39, Issue 9, 1223(2024)

Hourglass attention and progressive hybrid Transformer for image classification

Yanfei PENG, Yun CUI*, Kun CHEN, and Yongxin LI
Author Affiliations
  • School of Electronic and Information Engineering,Liaoning Technical University,Huludao 125105,China
  • show less
    Figures & Tables(9)
    Overall architecture of HAPHFormer.(a)ConvStem module;(b)Embed module;(c)PatchEmbed module.
    General architecture of Transformer(a)and structure of P-LocalMLP module(b)
    Down-top sample hourglass self-attention module
    • Table 1. Ablation experiment on CIFAR100 dataset

      View table
      View in Article

      Table 1. Ablation experiment on CIFAR100 dataset

      ModelTop-1 accuracy/%
      ViT73.81
      ViT+τ74.35
      ViT+M74.34
      ViT+τ+M74.87
      ViT+τ+M+Stem78.24
      ViT+τ+M+Stem+DTSA83.45
      ViT+τ+M+Stem+DTSA+PLM83.76
    • Table 2. Top-1 accuracy comparison of downsampling module on CIFAR100 dataset

      View table
      View in Article

      Table 2. Top-1 accuracy comparison of downsampling module on CIFAR100 dataset

      Downsampling moduleTop-1 accuracy/%
      Patch Emb81.71
      Patch Stem78.45
      ConvStem83.76
    • Table 3. Top-1 accuracy comparison of P-LocalMLP module %

      View table
      View in Article

      Table 3. Top-1 accuracy comparison of P-LocalMLP module %

      ModelT-ImageNetCIFAR10
      BN+FN69.3994.64
      DW+LN71.5196.97
      DW+LN+skip connect72.0997.32
    • Table 4. Top-1 accuracy comparison of attention module on CIFAR100 dataset

      View table
      View in Article

      Table 4. Top-1 accuracy comparison of attention module on CIFAR100 dataset

      Attention moduleM+τTop-1 accuracy/%
      DTSA×83.18
      83.76
      EMSAv2×80.59
      82.29
      Linear SRA×79.58
      82.52
    • Table 5. Top-1 accuracy comparison of overall architecture

      View table
      View in Article

      Table 5. Top-1 accuracy comparison of overall architecture

      ArchitectureTop-1 accuracy/%
      Attention,Attention,Attention,Attention73.69
      Pool,Pool,Pool,Pool62.09
      SpatialMLP,SpatialMLP,SpatialMLP,SpatialMLP61.73
      Pool,Pool,SpatialMLP,SpatialMLP77.83
      Pool,Pool,Attention,Attention83.76
    • Table 6. Top-1 accuracy comparison of different models on small-size datasets

      View table
      View in Article

      Table 6. Top-1 accuracy comparison of different models on small-size datasets

      ModelParams/MFLOPs/GT-ImageNet/%CIFAR10/%CIFAR100/%SVHN/%
      ConvNeXt-T294.4667.5194.7776.8796.17
      SwinV2-T284.3550.4178.2268.9782.19
      PVTv2-B2264.0463.9393.5873.2997.01
      MViTv2-T243.9953.9474.4473.1685.99
      ResTv2-T304.1068.5995.5780.5797.17
      HAPHFormer-T253.4172.0997.3283.7697.42
      ConvNeXt-S508.6967.8895.8281.3396.21
      SwinV2-S508.4558.9780.0670.3285.03
      PVTv2-B3456.9266.8594.2773.4497.41
      MViTv2-S356.0856.9276.7674.1386.34
      ResTv2-S405.9769.7095.9180.8997.23
      HAPHFormer-S355.1172.2997.3783.8897.77
      ConvNeXt-B8815.3668.6996.0483.3096.33
      SwinV2-B8714.9960.1480.3870.5185.51
      PVTv2-B46210.1467.9694.9477.2597.65
      MViTv2-B528.8862.4377.1874.5186.68
      ResTv2-B557.8870.1096.2181.0497.26
      HAPHFormer-B496.8472.5597.3884.1297.89
      PVTv2-B58111.7568.3795.1478.4297.88
      ResTv2-L8613.8372.3496.3781.7497.34
      HAPHFormer-L7211.2772.8097.4684.7198.16
    Tools

    Get Citation

    Copy Citation Text

    Yanfei PENG, Yun CUI, Kun CHEN, Yongxin LI. Hourglass attention and progressive hybrid Transformer for image classification[J]. Chinese Journal of Liquid Crystals and Displays, 2024, 39(9): 1223

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Oct. 25, 2023

    Accepted: --

    Published Online: Nov. 13, 2024

    The Author Email: Yun CUI (1727015916@qq.com)

    DOI:10.37188/CJLCD.2023-0338

    Topics