Text-to-image generation method based on self-supervised attention and image features fusion

步骤5：多模态上下文向量与初始特征拼接送入 $F_{1}$ 中，经过残差网络和上采样得到细化后的图像特征 $h_{1}$ ，特征 $h_{1}$ 和上阶段低分辨率特征进行特征融合后得到特征 $h_{1}^{'}$ ，经过 $3 \times 3$ 卷积得到 $128 \times 128$ 分辨率图像。

步骤6：图像特征 $h_{1}^{'}$ 与多模态上下文向量拼接送入 $F_{2}$ 中，经过残差网络和上采样得到细化后的图像特征 $h_{2}$ ，特征 $h_{2}$ 和上阶段分辨率特征进行特征融合后得到特征 $h_{2}^{'}$ ，经过 $3 \times 3$ 卷积得到 $256 \times 256$ 分辨率图像。

Table 1. Dataset
View table
View in Article
Table 1. Dataset
数据集名称 CUB COCO
训练集样本数量 8 855 82 783
测试集样本数量 2 933 40 470
单图片文本描述数量 10 5

Table 2. Experimental environment and hyperparameter configuration

View table

View in Article

Table 2. Experimental environment and hyperparameter configuration

名称	设备型号	超参数	配置
操作系统	Linux	Batchsize	20
CPU	Intel（R）Core（TM）i7-6 700 cpu @ 3.40 GHz	学习率	0.000 2
GPU	NVIDIA Tesla V100S PCle（32G）	CUB训练轮次	700
Python	3.7	COCO训练轮次	120
Pytorch	1.13.0	CUB： $λ$	5
CUDA	11.6	COCO： $λ$	50

Table 3. Evaluation index scores of CUB dataset
View table
View in Article
Table 3. Evaluation index scores of CUB dataset
Models IS FID
GAN-INT-CLS 2.88 $\pm$ 0.04 -
StackGAN++ 4.04 $\pm$ 0.05 30.30
AttnGAN 4.36 $\pm$ 0.03 21.48
SA-AttnGAN 4.52 $\pm$ 0.03 18.83
HDGAN 4.16 $\pm$ 0.05 -
DualAttn-GAN 4.59 $\pm$ 0.07 19.96
DAE-GAN 4.42 $\pm$ 0.04 18.27
Our SAF-GAN 4.67 $\pm$ 0.07 18.03

Table 4. Evaluation index scores of COCO dataset
View table
View in Article
Table 4. Evaluation index scores of COCO dataset
Models IS FID
StackGAN 8.45 $\pm$ 0.03 -
AttnGAN 25.85 $\pm$ 0.47 35.49
KT-GAN 30.61 $\pm$ 0.36 32.15
CSM-GAN 26.77 $\pm$ 0.24 -
Our SSA-DMGAN 28.53 $\pm$ 0.39 30.31

Table 5. Comparison of results on CUB dataset
View table
View in Article
Table 5. Comparison of results on CUB dataset
模块 IS FID
AttnGAN 4.36 $\pm$ 0.03 21.48
AttnGAN+CotNet 4.57 $\pm$ 0.07 19.47
AttnGAN+AFF 4.43 $\pm$ 0.04 20.83
AttnGAN+CotNet+AFF 4.67 $\pm$ 0.07 18.03

Table 6. Comparison of results on COCO dataset
View table
View in Article
Table 6. Comparison of results on COCO dataset
模块 IS FID
AttnGAN 25.85 $\pm$ 0.47 35.49
AttnGAN+CotNet 27.47 $\pm$ 0.23 32.17
AttnGAN+AFF 26.95 $\pm$ 0.41 31.52
AttnGAN+CotNet+AFF 28.53 $\pm$ 0.39 30.31

Tools

Get Citation

Copy Citation Text

Yonghui LIAO, Haitao ZHANG, Haibo JIN. Text-to-image generation method based on self-supervised attention and image features fusion[J]. Chinese Journal of Liquid Crystals and Displays, 2024, 39(2): 180

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites