Chinese Journal of Liquid Crystals and Displays, Volume. 39, Issue 2, 180(2024)

Text-to-image generation method based on self-supervised attention and image features fusion

Yonghui LIAO1, Haitao ZHANG2、*, and Haibo JIN1
Author Affiliations
  • 1School of Software, Liaoning Technical University, Huludao 125105, China
  • 2Computer Department, Shantou Polytrchnic, Shantou 515071, China
  • show less
    Figures & Tables(18)
    Self-attention module
    Frame diagram of SAF-GAN model
    CotNet self-monitoring module
    Attentional feature fusion
    Mutil-scale channel attention module
    Generate visual comparison of results at each stage
    Fusion result visualization
    Visual comparison of CUB dataset
    Visual comparison of COCO dataset
    Visual presentation of attention weights
    Visual display of feature fusion supplementation effect
    • Table 0. [in Chinese]

      View table
      View in Article

      Table 0. [in Chinese]

      算法1 文本图像生成算法

      步骤1:输入文本信息,通过文本编码器将其编码为句向量ŝ和词向量s

      步骤2:句子向量ŝ经过条件增强后与随机噪声z拼接送入F0模块得到初始特征h0

      步骤3:初始特征送入自监督注意力模块得到新的特征h0',经过3×3卷积得到具有64×64分辨率的粗糙图像。

      步骤4:加权特征h0'与单词向量s输入注意力矩阵形成多模态上下文向量。

      步骤5:多模态上下文向量与初始特征拼接送入F1中,经过残差网络和上采样得到细化后的图像特征h1,特征h1和上阶段低分辨率特征进行特征融合后得到特征h1',经过3×3卷积得到128×128分辨率图像。

      步骤6:图像特征h1'与多模态上下文向量拼接送入F2中,经过残差网络和上采样得到细化后的图像特征h2,特征h2和上阶段分辨率特征进行特征融合后得到特征h2',经过3×3卷积得到256×256分辨率图像。

    • Table 1. Dataset

      View table
      View in Article

      Table 1. Dataset

      数据集名称CUBCOCO
      训练集样本数量8 85582 783
      测试集样本数量2 93340 470
      单图片文本描述数量105
    • Table 2. Experimental environment and hyperparameter configuration

      View table
      View in Article

      Table 2. Experimental environment and hyperparameter configuration

      名称设备型号超参数配置
      操作系统LinuxBatchsize20
      CPUIntel(R)Core(TM)i7-6 700 cpu @ 3.40 GHz学习率0.000 2
      GPUNVIDIA Tesla V100S PCle(32G)CUB训练轮次700
      Python3.7COCO训练轮次120
      Pytorch1.13.0CUB:λ5
      CUDA11.6COCO:λ50
    • Table 3. Evaluation index scores of CUB dataset

      View table
      View in Article

      Table 3. Evaluation index scores of CUB dataset

      ModelsISFID
      GAN-INT-CLS2.88±0.04-
      StackGAN++4.04±0.0530.30
      AttnGAN4.36±0.0321.48
      SA-AttnGAN4.52±0.0318.83
      HDGAN4.16±0.05-
      DualAttn-GAN4.59±0.0719.96
      DAE-GAN4.42±0.0418.27
      Our SAF-GAN4.67±0.0718.03
    • Table 4. Evaluation index scores of COCO dataset

      View table
      View in Article

      Table 4. Evaluation index scores of COCO dataset

      ModelsISFID
      StackGAN8.45±0.03-
      AttnGAN25.85±0.4735.49
      KT-GAN30.61±0.3632.15
      CSM-GAN26.77±0.24-
      Our SSA-DMGAN28.53±0.3930.31
    • Table 5. Comparison of results on CUB dataset

      View table
      View in Article

      Table 5. Comparison of results on CUB dataset

      模块ISFID
      AttnGAN4.36±0.0321.48
      AttnGAN+CotNet4.57±0.0719.47
      AttnGAN+AFF4.43±0.0420.83
      AttnGAN+CotNet+AFF4.67±0.0718.03
    • Table 6. Comparison of results on COCO dataset

      View table
      View in Article

      Table 6. Comparison of results on COCO dataset

      模块ISFID
      AttnGAN25.85±0.4735.49
      AttnGAN+CotNet27.47±0.2332.17
      AttnGAN+AFF26.95±0.4131.52
      AttnGAN+CotNet+AFF28.53±0.3930.31
    Tools

    Get Citation

    Copy Citation Text

    Yonghui LIAO, Haitao ZHANG, Haibo JIN. Text-to-image generation method based on self-supervised attention and image features fusion[J]. Chinese Journal of Liquid Crystals and Displays, 2024, 39(2): 180

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Research Articles

    Received: Mar. 24, 2023

    Accepted: --

    Published Online: Apr. 24, 2024

    The Author Email: Haitao ZHANG (htzhang@stpt.edu.cn)

    DOI:10.37188/CJLCD.2023-0107

    Topics