Text-to-image method based on XLnet and DMGAN

Fig. 8. Generated details of initial image.（a）Initial noise image；（b）Image after full connection and four upsampling；（c）Initial image after ECA channel attention and 3×3 convolution.

Download full size

View in Article

Fig. 9. Refined details of the image.（a）64×64 image；（b）128×128 image；（c）256×256 image；（d）Attention weights.

Download full size

View in Article

Fig. 10. Generated images from the same text

Download full size

View in Article

Table 0. [in Chinese]

View table

View in Article

Table 0. [in Chinese]

算法1 数据预处理方法

输入：待处理文本text

输出：分词后每个词在词汇表中的索引token_index

步骤：

1.调用XLNetTokenizer 类中的from_pretrained（）方法加载token embedding；

2.基于tokenizer对输入文本text作分词处理；

3.对分好的词去重，统计词在文本中出现次数；

4.基于token embedding词表，得到词对应位置索引token_index；

5.返回词在词汇表中的索引 token_index。

Table 0. [in Chinese]

View table

View in Article

Table 0. [in Chinese]

算法2 构建XLnet模型训练字词向量

输入：字词索引token_index

输出：XLNet 模型学习出的字词向量word_embedding步骤：

1.调用XLNetModel类中from_pretrained（）方法加载PyTorch提供的XLNet模型；

2.初始化XLNet的embedding矩阵embedding_vec；

3.基于embedding_vec，根据索引token_index值得到字词对应的特征向量word_embedding；

4.对word_embedding计算加权平均值；

5.返回模型学习到的字词向量word_embedding。

Table 1. Comparison of evaluation indicators

View table

View in Article

Table 1. Comparison of evaluation indicators

方法	评价指标
方法	$I S ↑$	$F I D ↓$
StackGAN^［8］	3.70±0.04	35.11
StackGAN-v2^［9］	3.84±0.06	30.30
AttnGAN^［10］	4.36±0.03	15.38
MirrorGAN^［23］	4.56±0.05	18.34
DMGAN^［11］	4.75±0.07	16.09
DFGAN^［24］	5.10±0.04	14.81
本文方法	5.22±0.18	13.31

Table 2. Results of ablation experiments
View table
View in Article
Table 2. Results of ablation experiments
方法评价指标
$I S ↑$ $F I D ↓$
DMGAN 4.75±0.07 16.09
DMGAN+XLnet 5.10±0.21 14.55
DMGAN+ECA 4.85±0.04 15.67
DMGAN+XLnet+ECA 5.22±0.18 13.31