Semiconductor Optoelectronics, Volume. 45, Issue 6, 960(2024)

Deep Fusion-GAN Enhancement Model for Text-to-Image

WEI Yiran, YI Junkai, ZHU Kequan, and TAN Lingling
Author Affiliations
  • College of Automation, Beijing Information Science and Technology University, Beijing 100192, CHN
  • show less

    A deep fusion generative adversarial network (DF-GAN) enhancement model combined with self-attention mechanism is proposed for low semantic relevance, fuzzy details, and inadequate structural integrity in text-to-image tasks. First, the bidirectional encoder representations from transformers (BERT) model is used to mine the semantic features of text context and combined with the deep text-image fusion block to realize the matching of deep text semantics and image regional features. Second, a self-attention mechanism module is introduced as a supplement to the convolution module at the model architecture level, aiming to enhance the establishment of long-distance and multilevel dependencies. The experimental results demonstrate that the proposed enhancement model not only strengthens the semantic relationship between the text and image but also ensures the inclusion of precise details and the overall integrity of the generated image.

    Tools

    Get Citation

    Copy Citation Text

    WEI Yiran, YI Junkai, ZHU Kequan, TAN Lingling. Deep Fusion-GAN Enhancement Model for Text-to-Image[J]. Semiconductor Optoelectronics, 2024, 45(6): 960

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: May. 26, 2024

    Accepted: Feb. 28, 2025

    Published Online: Feb. 28, 2025

    The Author Email:

    DOI:10.16818/j.issn1001-5868.2024052602

    Topics